知乎爬虫

最新推荐文章于 2024-04-07 09:36:35 发布

笑傲code

最新推荐文章于 2024-04-07 09:36:35 发布

阅读量642

点赞数

本文链接：https://blog.csdn.net/weixin_41472455/article/details/80480328

版权

import requests
from pyquery import PyQuery as pq

def getHtml(url):
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
        }
        r = requests.get(url,headers=headers)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except requests.RequestException as e:
        return e

def parseHtml(html):
    doc = pq(html)
    items = doc('.explore-tab.feed-item').items()
    for item in items:
        question = item.find('h2').text()
        author = item.find('.author-link-line').text()
        answer = pq(item.find('.content').html()).text()
        with open('explore2.txt','a',encoding='utf-8') as f:
            f.write('\n'.join([question,author,answer]))
            f.write('\n' + '=' * 50 + '\n')
def main():
    url = "https://www.zhihu.com/explore"
    html =getHtml(url)
    parseHtml(html)

main()

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

笑傲code

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
知乎爬虫

import requestsfrom pyquery import PyQuery as pqdef getHtml(url): try: headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, ...
复制链接

扫一扫