爬虫入门Task04

最新推荐文章于 2022-03-20 18:36:55 发布

weixin_41948788

最新推荐文章于 2022-03-20 18:36:55 发布

阅读量88

点赞数

本文链接：https://blog.csdn.net/weixin_41948788/article/details/105800162

版权

用selenium完成腾讯热点的爬虫

存储成csv文件每一行如下标号（从1开始）,标题,链接

import time
from  selenium import webdriver
driver=webdriver.Chrome(executable_path= 'C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe')
driver.get("https://news.qq.com")
#了解ajax加载
for i in range(1,100):
    time.sleep(2)
    driver.execute_script("window.scrollTo(window.scrollX, %d);"%(i*200))

from bs4 import BeautifulSoup
html=driver.page_source
bsObj=BeautifulSoup(html,"xml")

jxtits=bsObj.find_all("div",{"class":"jx-tit"})[0].find_next_sibling().find_all("li")

print("index",",","title",",","url")
result=[]
for i,jxtit in enumerate(jxtits):
#     print(jxtit)
    
    try:
        text=jxtit.find_all("img")[0]["alt"]
    except:
        text=jxtit.find_all("div",{"class":"lazyload-placeholder"})[0].text
    try:
        url=jxtit.find_all("a")[0]["href"]
    except:
        print(jxtit)
    print(i+1,",",text,",",url) 
    result.append([i+1,text,url])


import pandas as pd
name = ['序号','标题','链接']
df = pd.DataFrame(columns=name, data=result)
df.to_csv('腾讯新闻热点.csv',index=False)

爬取的结果如下图所示：
在这里插入图片描述

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_41948788

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫入门Task04

用selenium完成腾讯热点的爬虫存储成csv文件每一行如下标号（从1开始）,标题,链接import timefrom selenium import webdriverdriver=webdriver.Chrome(executable_path= 'C:\Program Files (x86)\Google\Chrome\Application\chromedriver.ex...
复制链接

扫一扫