创新实训【2】——爬虫知乎

最新推荐文章于 2021-09-28 07:49:01 发布

ayy洋

最新推荐文章于 2021-09-28 07:49:01 发布

阅读量227

点赞数 1

本文链接：https://blog.csdn.net/weixin_43710646/article/details/115302060

版权

爬取内容

因为知乎爬虫大部分需要登录，有时使用selenium无法访问页面，先爬取了一个有关山东大学话题的网页。主要包括了具体链接，话题分类和具体话题。
链接如下：https://www.zhihu.com/topic/19864829/index

使用工具

python3.7
selenium
chromeDirver

具体代码

import time
from selenium import webdriver
import warnings
warnings.filterwarnings("ignore")

url='https://www.zhihu.com/topic/19864829/index'
driver=webdriver.Chrome()
driver.get(url)
time.sleep(5)
#print("获取网页")


f=open("zhihu_urls.csv",'a',encoding='utf-8')
ls=['链接','题目','话题']
f.write(",".join(ls)+"\n")

for topicmodule in driver.find_element_by_class_name("TopicIndex-contentMain").find_elements_by_class_name("TopicIndexModule"):
    #话题分类
    topic=topicmodule.find_element_by_class_name("TopicIndexModule-title").text 

    for item in topicmodule.find_elements_by_class_name("TopicIndexModule-item"):
        topic_info=[]
        href=item.find_element_by_tag_name("a").get_attribute("href") #链接
        title=item.find_element_by_tag_name("a").text  #具体话题

        topic_info.append(href)
        topic_info.append(title)
        topic_info.append(topic)
        print(topic_info)

       f.write(",".join(topic_info) + "\n")

f.close()
driver.quit()
print("爬取知乎话题成功")

运行结果

csv内容：
在这里插入图片描述

ayy洋

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
创新实训【2】——爬虫知乎

爬取内容因为知乎爬虫大部分需要登录，有时使用selenium无法访问页面，先爬取了一个有关山东大学话题的网页。主要包括了具体链接，话题分类和具体话题。链接如下：https://www.zhihu.com/topic/19864829/index使用工具python3.7seleniumchromeDirver具体代码import timefrom selenium import webdriverimport warningswarnings.filterwarnings("ign
复制链接

扫一扫