爬虫实战 | 手把手用Python教你采集&可视化知乎问题的回答(内附代码)

本文手把手教你使用Python爬虫采集知乎问题的回答,详细讲解了从探寻网址规律、访问网页、解析数据到存储CSV的全过程,并展示了如何制作词云图进行数据可视化。提供完整代码,适合Python初学者。
摘要由CSDN通过智能技术生成

问题链接 

https://www.zhihu.com/question/432119474/answer/1597194524

爬虫设计流程

  1. 探寻网址规律
  2. 尝试对某一网页访问
  3. 解析感兴趣的数据
  4. 存储到csv
  5. 整理汇总代码

 

1. 探寻网址规律

  1. 按F12键打开 开发者工具
  2. 选中network面板,点击 查看全部6217个回答
  3. 准备观察开发者工具中的监测到的网址
  4. 对每个网址经过下图456操作
  5. 点击preview
  6. 查看content与当前页面的回答是否一致
  7. 最终发现网址如7中的红色方框,请求方式为GET方法

 

  1. 依旧是7所在的页面,滑动到最下方,可以看到offset和limit

发现的网址(注意最后一行的offset)

https://www.zhihu.com/api/v4/questions/432119474/answers?include=data%5B*%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%2Cpaid_info_content%3Bdata%5B*%5D.mark_infos%5B*%5D.url%3Bdata%5B*%5D.author.follower_count%2Cbadge%5B*%5D.topics%3Bsettings.table_of_content.enabled%3B&offset=3&limit=5&sort_by=default&platform=desktop

中也存在offset,该单词的意思是偏移量。

  • offset 我猜测该值类似于page页面数
  • limit 每个url能展现多少个回答,默认5个。

网址模板(注意模板内最后一行offset)

https://www.zhihu.com/api/v4/questions/432119474/answers?include=data%5B*%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_labeled%2Cis_recognized%2Cpaid_info%2Cpaid_info_content%3Bdata%5B*%5D.mark_infos%5B*%5D.url%3Bdata%5B*%5D.author.follower_count%2Cbadge%5B*%5D.topics%3Bsettings.table_of_content.enabled%3B&offset={offset}&limit=5&sort_by=default&platform=desktop

当前回答一共有6200多个,每页5个,那么offset可以有1240页。

2. 尝试对某一网页访问

import requests

template = 'https://www.zhihu.com/api/v4/questions/432119474/answers?include=data%5B*%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_in
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值