Python网络爬虫-Datawhale组队task4

最新推荐文章于 2024-07-21 23:25:11 发布

yxyibb

最新推荐文章于 2024-07-21 23:25:11 发布

阅读量150

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/u012835414/article/details/105802688

版权

爬虫专栏收录该内容

4 篇文章 0 订阅

订阅专栏

热点新闻

selenium控制，找网页html的各标签

from selenium import webdriver
import time
from bs4 import BeautifulSoup

browser = webdriver.Chrome(executable_path='/home/yx/Documents/DW/spider/env/chromedriver')
browser.get('https://news.qq.com')

for i in range(1, 100):
    time.sleep(0.5)
    browser.execute_script("window.scrollTo(window.scrollX, %d);" % (i * 200))

html = browser.page_source
bsObj = BeautifulSoup(html, 'lxml')

jx = bsObj.find_all("div", {"class": "jx-tit"})[0].find_next_sibling().find_all("li")
print("index", ",", "title", ",", "url")
for i, jx in enumerate(jx):
    try:
        text = jx.find_all("img")[0]["alt"]
    except:
        text = jx.find_all("div", {"class": "lazyload-placeholder"})[0].text
    try:
        url = jx.find_all("a")[0]["href"]
    except:
        print(jx)
    print(i + 1, ",", text, ",", url)

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

yxyibb

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python网络爬虫-Datawhale组队task4

腾讯热点新闻selenium控制，找网页html的各标签from selenium import webdriverimport timefrom bs4 import BeautifulSoupbrowser = webdriver.Chrome(executable_path='/home/yx/Documents/DW/spider/env/chromedriver')brow...
复制链接

扫一扫