用selenium爬取百度新闻

最新推荐文章于 2024-07-01 11:04:11 发布

lizhaozhaozhaoxuan

最新推荐文章于 2024-07-01 11:04:11 发布

阅读量2.8k

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/lizhaozhaozhaoxuan/article/details/80550645

版权

这段代码展示了如何使用selenium模拟浏览器行为，打开百度新闻首页，输入关键词进行搜索，并遍历搜索结果，抓取每页前20条新闻的标题、发布时间和网页内容，保存为json文件。

摘要由CSDN通过智能技术生成

# -*- coding: UTF-8 -*-
from selenium.common.exceptions import TimeoutException, NoSuchElementException, WebDriverException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from bs4 import BeautifulSoup
import urllib.request
import io
import json
import os
import requests
import sys
#用selenium模拟浏览器行为打开chrome，打开新闻首页，输入关键词，点击，然后查找下一页
# import chardet
import re
def test_sel(keyword):
    driver = webdriver.Chrome()
    link = 'http://news.baidu.com/?tn=news'
    driver.get(link)
    try:
        WebDriverWait(driver, 30).until(
            EC.presence_of_element_located((By.ID, "ww"))
        )

    except TimeoutException:
        print ('加载页面失败')
    try:
        element = driver.find_element

最低0.47元/天解锁文章

lizhaozhaozhaoxuan

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
用selenium爬取百度新闻

# -*- coding: UTF-8 -*-from selenium.common.exceptions import TimeoutException, NoSuchElementException, WebDriverExceptionfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support ...
复制链接

扫一扫

专栏目录