你的大学生活过得怎么样?充实?有趣?有遗憾?
本文我们使用 Python 爬取了大学相关话题中的热门高赞问答,看看是否有你熟悉的场景。
首先,搜一些大学相关话题中热度比较高的几个,如下图所示:
这个我们通过话题的关注人数、问题数量、精华内容等方面判断,接着我们用鼠标选中一个话题点进去,如下图所示:
我们要记录一下网址中的话题 ID,就是网址中 topic 后面那一串数字,这个在爬取时要用到。
接下来我们看一下爬取的实现,我们先导入需要用到的 Python 库,如下所示:
import re, json, random, requests, urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
爬取问答内容的具体实现代码如下所示:
def get_answers_by_page(topic_id, page_no):
global db, answer_ids, maxnum
limit = 10
offset = page_no * limit
url = "https://www.zhihu.com/api/v4/topics/" + str(
topic_id) + "/feeds/essence?include=data%5B%3F(target.type%3Dtopic_sticky_module)%5D.target.data%5B%3F(target.type%3Danswer)%5D.target.content%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%3Bdata%5B%3F(target.type%3Dtopic_sticky_module)%5D.target.data%5B%3F(target.type%3Danswer)%5D.target.is_normal%2Ccomment_count%2Cvoteup_count%2Ccontent%2Crelevant_info%2Cexcerpt.author.badge%5B%3F(type%3Dbest_answerer)%5D.topics%3Bdata%5B%3F(target.type%3Dtopic_sticky_module)%5D.target.data%5B%3F(target.type%3Darticle)%5D.target.content%2Cvoteup_count%2Ccomment_count%2Cvoting%2Cauthor.badge%5B%3F(type%3Dbest_answerer)%5D.topics%3Bdata%5B%3F(target.type%3Dtopic_sticky_module)%5D.target.data%5B%3F(target.type%3Dpeople)%5D.target.answer_count%2Carticles_count%2Cgender%2Cfollower_count%2Cis_followed%2Cis_following%2Cbadge%5B%3F(type%3Dbest_answerer)%5D.topics%3Bdata%5B%3F(target.type%3Danswer)%5D.target.annotation_detail%2Ccontent%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%3Bdata%5B%3F(target.type%3Danswer)%5D.target.author.badge%5B%3F(type%3Dbest_answerer)%5D.topics%3Bdata%5B%3F(target.type%3Darticle)%5D.target.annotation_detail%2Ccontent%2Cauthor.badge%5B%3F(type%3Db