爬虫之初级实战项目:爬取知乎任一作者的文章练手
在正式上代码之前,先过一遍之前所学知识的框架内容,温故而知新!!!
接下来我们直接上代码,一定要手敲代码、手敲代码、手敲代码!!!
import requests,csv
csv_file = open('知乎-收录.csv','w',newline = '',encoding = 'utf-8')
#加newline=' '参数的原因是,可以避免csv文件出现两倍的行距(就是能避免表格的行与行之间出现空白行);
#加encoding='utf-8',可以避免编码问题导致的报错或乱码。
writer = csv.writer(csv_file)
writer.writerow(['标题','摘要','链接'])
url ='https://www.zhihu.com/api/v4/members/zhang-jia-wei/included-articles?'
headers={'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
offset = 0
while True:
params = {
'include': 'data[*].comment_count,suggest_edit,is_normal,thumbnail_extra_info,thumbnail,can_comment,comment_permission,admin_closed_comment,content,voteup_count,created,updated,upvoted_followees,voting,review_info,is_labeled,label_info;data[*].author.badge[?(type=best_answerer)].topics',
'offset': str(offset),
'limit': '10',
'sort_by': 'included'
}
res = requests.get(url,headers = headers)
js_zh = res.json()
zhihu = js_zh['data']
for i in zhihu:
list1 = [i['title'],i['excerpt'],i['url']]
writer.writerow(list1)
offset = offset + 10 #利用offset对循环进行控制
if offset > 50:
break
csv_file.close()
标签:练手,comment,知乎,Python,代码,writer,offset,csv