最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【记录】整理爬取趣配音app的数据的逻辑

app crifan 978浏览 0评论
在:
【记录】分析xxxapp中数据来源和如何爬取-1
【记录】分析xxxapp中数据来源和如何爬取-2
之后,去整理爬取数据的逻辑。
此处开始整理爬取数据的思路:
爬取的逻辑:
先去获取category=all的get_course_list
https://childapi.xxx.com/course/get_course_list?sign=4e89d3ae91d6e5c2a05908be477e913d&uid=0&sort=new&level=all&nature_style=all&start=0&category_id=1&auth_token=0&ishow=0&nature_area=all&rows=10&nature_id=all
&stamp=1536819997
然后每次rows=10,每次增加10个,start=0,10,20等等
去获取所有的category的list,其中包含category的id
最多实测有36000+个category id
然后针对每个category去获取详情:
https://childapi.xxx.com/course/detail_new?course_id=61288
保存mp4视频,mp3音频,srt字幕,以及json详情信息
针对每个course,去获取:
已配音小伙伴的
https://childapi.xxx.com/course/last_show_peoples?sign=da0a90e3c7ee2f7656cbae6575c3521d&stamp=1536895345&uid=0&auth_token=0&course_id=59901&start=0&rows=20
和:
点赞榜的
https://childapi.xxx.com/StudyShow/course_show?sign=da0a90e3c7ee2f7656cbae6575c3521d&stamp=1536895345&uid=0&auth_token=0&course_id=59901&start=0&rows=20
其中可以继续start=20的倍数,rows=20,继续获取,知道返回data是空列表[]
播放榜的
https://childapi.xxx.com/StudyShow/viewTop?sign=4ccb35d6f3b4cfecc81c9b02462c5604&stamp=1536895896&uid=0&auth_token=0&course_id=59901&start=0&rows=20
其中可以继续start=20的倍数,rows=20,继续获取,知道返回data是空列表[]
另外(为了获取更多用户,则)单独再去:
底部Tab 排行榜-》学霸
https://childapi.xxx.com/top/sign_top?sign=17c411d4c4d011c614b7cef633fc68ce&stamp=1536909875&uid=0&auth_token=0&area_id=0&start=40&rows=20
其中可以继续start=20的倍数,rows=20,继续获取,知道返回data是空列表[],实测发现不超过200个
底部Tab 排行榜-》人气
https://childapi.xxx.com/top/shownews_top_redis?sign=f00a8402d9ea4f26a0eee153aba71fb3&stamp=1536910053&uid=0&time_type=1&area_id=0&ranking_type=0&start=40&auth_token=0&rows=20
其中可以继续start=20的倍数,rows=20,继续获取,知道返回data是空列表[],实测发现不超过200个
对于每个用户:
获取详情是:
https://childapi.xxx.com/member?sign=a5d2f9bd3f98214ab3cf8de47e14833c&stamp=1536917132&uid=0&auth_token=0&member_id=4864238
然后再去后去每个用户的作品列表:
https://childapi.xxx.com/member/show_list?sign=9fbde32314aff6b77be9b64b2464f78a×tamp=1536917132&uid=0&auth_token=0&start=0&member_id=4864238&rows=20
对于每个用户的作品,都是一个show,则再去获取show的详情:
https://childapi.xxx.com/show/detail?show_id=60377593
另外,为了获取更多其他用户,对于每个用户再去获取对应关注和粉丝:
https://childapi.xxx.com/member/follows?auth_token=0&member_id=32527428&rows=20&start=0×tamp=1536919513254&uid=0&sign=e56d81d7373be0b53ecc8fe16cb2df2c
和:
https://childapi.xxx.com/member/fans?auth_token=0&member_id=13494467&rows=20&start=0×tamp=1536919109752&uid=0&sign=4be774e38c0619529b1c7556011fd9ba
另外再为了尽量扩大,获取更多的course,user,show,所以在可能的情况下,尽量也去通过得到的id,获得相关的内容
比如:
show的detail中,获取user和course
–>>
保存数据结果的形式:
注:
举例
(1)course_61288_info.json
{
        "id": "61288",
        "title": "我们有很多共同点",
        "description": "麦洛找了很多它和小猫的共同点,试图让小猫不那么害羞,可是小猫并不吃这一套。",
        "video": "

",
        "video_srt": "

",
        "audio": "


",
        "pic": "
https://img.xxx.cn/2018-09-10/5b9634a3d742d.jpg
",
        "if_subtitle": "0",
        "dif_level": "3",
        "subtitle_en": "
https://cdn2.xxx.cn/2018-09-10/15365514894833.srt
",
        "subtitle_num": "6",
        "category_id": "10",
        "shows": "353",
        "views": "2270",
        "editor": "xxxMargaret",
        "editor_uid": "0",
        "tag": "小兔子麦洛,野猫",
        "status": "1",
        "create_time": "2016-01-26 11:38",
        "isalbum": "0",
        "top": "0",
        "ifshow": "1",
        "update_time": "1536658200",
        "sort": "-249308",
        "check": "1",
        "copyright": "1",
        "is_vip": "0",
        "show_peoples": "350",
        "score_peoples": "38",
        "is_score": "2",
        "is_needbuy": "0",
        "duration": "28",
        "bookOriginalId": "0",
        "permit_client": "2,4,1,3",
        "assign_times": "0",
        "copy": "视频片段摘自:“Milo and the wild cat _ Cartoon for kids”,本视频仅供免费学习使用!如需观看完整版,请支持正版!",
        "redirect": {
            "title": "",
            "url": "",
            "sort": "0"
        },
        "editors": [
            {
                "title": "上传",
                "nickname": "xxxMargaret",
                "uid": 0
            },
            {
                "title": "听译",
                "nickname": "妞妞的蛋卷",
                "uid": 0
            },
            {
                "title": "审校",
                "nickname": "酥酥Jesus",
                "uid": 0
            },
            {
                "title": "制作",
                "nickname": "徐婷",
                "uid": 0
            }
        ],
        "share_talk": "",
        "share_pic": "
https://img.xxx.cn/2018-09-10/5b9634a3d742d.jpg
",
        "share_url": "
https://child.xxx.cn/index.php?m=home&c=Activity&a=childshare_video&course=MDAwMDAwMDAwMLGdpquCe8yh
",
        "score_type": 4,
        "score_weight": [
            {
                "low": "0",
                "height": "55",
                "weight": "1.20"
            },
            {
                "low": "56",
                "height": "70",
                "weight": "1.15"
            },
            {
                "low": "71",
                "height": "80",
                "weight": "1.10"
            },
            {
                "low": "81",
                "height": "90",
                "weight": "1.05"
            },
            {
                "low": "91",
                "height": "100",
                "weight": "1.00"
            }
        ],
        "album_id": null,
        "is_strate": "0",
        "if_strate_buy": "0",
        "strate_audio_id": "0",
        "category": "今日更新",
        "nature": "",
        "album_title": "",
        "share_title": "我发现了一个超有意思的今日更新片段",
        "share_desc": "《我们有很多共同点》,快来围观吧",
        "share_friend": "我发现了一个超有意思的今日更新片段《我们有很多共同点》",
        "skip_url": "",
        "strate_url": "",
        "strate_isbuy": 0,
        "strate_pic": "
https://img.xxx.cn/strate_detail.png
",
        "album_isbuy": 0,
        "feedback_url": "
https://child.xxx.cn/home/basic/course_feedback?uid=0&course_id=61288
",
        "video_adver": [],
        "course_adver": [
            {
                "id": "6427",
                "title": "想和老外“无障碍交流”?来这里",
                "pic": "
https://img.xxx.cn/2018-09-11/5b97cdc9be2b8.jpg
",
                "type": "custom",
                "son_type": "",
                "show_id": "0",
                "is_share": "0",
                "content": "",
                "scheme_url": "",
                "sub_title": "",
                "show_type": "0",
                "sort": "0",
                "shows": "10985",
                "views": "392",
                "weight": "10",
                "score_type": "2",
                "score": "2",
                "share_pic": "
https://img.xxx.cn/2018-09-11/5b97cdc9be2b8.jpg
",
                "html": "",
                "clickreport": [],
                "displayreport": [],
                "url": "
http://shaoer.xxx.com/basic/slider?adv=MDAwMDAwMDAwMLGdsquBrqKh
"
            }
        ]
    }
(2)user_4864238_info.json
{
        "id": "4864238",
        "uc_id": "8638661",
        "nickname": "夏竹",
        "avatar": "
https://img.xxx.cn/2018-06-30/5b3734cb58bee.jpg
",
        "mobile": "",
        "app_type": "2",
        "version": "3.9.0",
        "push_info": "{\"comments\":\"1\"}",
        "signature": "其实皮一下很开心",
        "birthday": "2017-09-24",
        "sex": "2",
        "school": "1435652",
        "area": "4403",
        "type": "2",
        "status": "1",
        "reg_time": "1465926412",
        "fans": "973",
        "uid": "4864238",
        "follows": "1516",
        "views": "228",
        "photos": "71",
        "guestbooks": "0",
        "shows": "118",
        "words": "0",
        "collects": "4",
        "school_str": "h华x新小学",
        "is_black": "0",
        "support_collect": "4",
        "cover": "
https://img.xxx.cn/cover_default.jpg
",
        "medal": [],
        "is_crown": "0",
        "search_id": "4927133",
        "ugcactive": {
            "icon": "",
            "title": "字幕组",
            "sub_title": "新活动",
            "url": "
http://ugctest.xxx.cn/app/index/index?
"
        },
        "is_following": "0",
        "is_follow": "0",
        "follow_nickname": "",
        "user_number": "8862171",
        "score_time": "",
        "vip_endtime": "0",
        "is_vip": "0",
        "libu_level": "0",
        "claims_url": "
https://child.xxx.cn/home/basic/report?type=3&tyid=0&uid=MDAwMDAwMDAwMLB0nm8&member_id=MDAwMDAwMDAwMLF3yGSBe67esaR0cg
",
        "dav": "",
        "dv_type": "0",
        "dv_status": "0",
        "libu_vip_endtime": "0",
        "libu_vip": "0",
        "libu_first": "0"
    }
(3)show_144138897_info.json
{
            "id": "144138897",
            "uid": "4864238",
            "course_id": "56931",
            "album_id": "3353",
            "video": "

",
            "create_time": "2018-07-12 16:46",
            "score": "91",
            "comments": "0",
            "supports": "2",
            "views": "0",
            "diamonds": "0",
            "nickname": "夏竹",
            "avatar": "",
            "school": "1435652",
            "area": "4403",
            "birthday": "2017-09-24",
            "course_title": "你是谁呢",
            "pic": "
https://img.xxx.cn/2018-07-03/5b3b28404a45b.jpg
",
            "permit_client": "2,4,1,3",
            "permit_show": "1",
            "is_crown": "0",
            "school_str": "",
            "dav": "",
            "dv_type": "0",
            "is_vip": "0"
        }
(4)course_61288_shows.json
{
  "total": 1,
  "shows": [
    {
      "id": "155469749",
      "uid": "2248187",
      "nickname": "Aegla",
      "course_id": "15159",
      "course_title": "不要低估他",
      "video": "

",
    }
  ]
}

转载请注明:在路上 » 【记录】整理爬取趣配音app的数据的逻辑

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
95 queries in 0.199 seconds, using 22.21MB memory