分析所抓到的文章列表数据包:大致分为两类,一类是有视频的文章,一类则是没有视频的文章。
有视频的文章json内容里均有video_id这个key,如下图所示:
没有视频的文章:json文件内容均包含title、 abstract、 article_url等信息,具体内容如下.
{
"read_count":7089,
"media_name":"数码日记",
"ban_comment":0,
"abstract":"一般情况下,购买手机后我们都会买SIM卡,这是无可厚非的事情。但也有些网友买手机仅仅是作为备用机,用于日常上网使用,而在没有SIM卡的情况下,手机要想联网,只能通过连接Wi-Fi的方式实现。",
"image_list":[
{
"url":"http://p1.pstatp.com/list/ef6000f83af6113a252",
"width":698,
"url_list":[
{
"url":"http://p1.pstatp.com/list/ef6000f83af6113a252"
},
{
"url":"http://p4.pstatp.com/list/ef6000f83af6113a252"
},
{
"url":"http://p.pstatp.com/list/ef6000f83af6113a252"
}
],
"uri":"list/ef6000f83af6113a252",
"height":392
},
{
"url":"http://p3.pstatp.com/list/ef6000f83f79a941883",
"width":981,
"url_list":[
{
"url":"http://p3.pstatp.com/list/ef6000f83f79a941883"
},
{
"url":"http://p6.pstatp.com/list/ef6000f83f79a941883"
},
{
"url":"http://p.pstatp.com/list/ef6000f83f79a941883"
}
],
"uri":"list/ef6000f83f79a941883",
"height":551
},
{
"url":"http://p3.pstatp.com/list/ef6000f8405551c32f2",
"width":943,
"url_list":[
{
"url":"http://p3.pstatp.com/list/ef6000f8405551c32f2"
},
{
"url":"http://p6.pstatp.com/list/ef6000f8405551c32f2"
},
{
"url":"http://p.pstatp.com/list/ef6000f8405551c32f2"
}
],
"uri":"list/ef6000f8405551c32f2",
"height":530
}
],
"has_video":false,
"article_type":0,
"tag":"digital",
"has_m3u8_video":0,
"keywords":"MIUI,SIM卡,Wi-Fi",
"user_verified":1,
"aggr_type":1,
"cell_type":0,
"article_sub_type":0,
"bury_count":0,
"title":"不插SIM卡,不用Wi-Fi,小米手机也能上网",
"ignore_web_transform":1,
"source_icon_style":1,
"tip":0,
"hot":0,
"share_url":"http://toutiao.com/a6351618096909779201/?iid=6181230843&app=news_article",
"has_mp4_video":0,
"source":"数码日记",
"comment_count":22,
"article_url":"http://toutiao.com/group/6351618096909779201/",
"filter_words":[
{
"id":"8:0",
"name":"重复、旧闻",
"is_selected":false
},
{
"id":"9:1",
"name":"内容质量差",
"is_selected":false
},
{
"id":"5:32370023",
"name":"来源:数码日记",
"is_selected":false
},
{
"id":"2:306461588",
"name":"路由器",
"is_selected":false
},
{
"id":"6:47522",
"name":"小米手机",
"is_selected":false
}
],
"publish_time":1478851309,
"action_list":[
{
"action":1,
"extra":{
},
"desc":""
},
{
"action":3,
"extra":{
},
"desc":""
},
{
"action":7,
"extra":{
},
"desc":""
},
{
"action":9,
"extra":{
},
"desc":""
}

本文介绍如何使用Python爬虫抓取今日头条的文章和视频数据,包括解析有视频和无视频的文章JSON内容,获取视频URL并解密,以及如何加载更多文章的数据请求分析。
最低0.47元/天 解锁文章


被折叠的 条评论
为什么被折叠?



