爬虫实例1:爬取百度热搜风云榜前50条热搜并将热搜发送至自己邮箱

最新推荐文章于 2023-07-18 12:18:47 发布

VIP文章南巷的花猫

最新推荐文章于 2023-07-18 12:18:47 发布

阅读量1.9k

点赞数 3

分类专栏： python 爬虫文章标签：爬虫实例

本文链接：https://blog.csdn.net/qq_42662411/article/details/103457495

版权

1-利用requests库以及xpath 获取百度热搜风云榜的字段如:标题title 热搜url

url = 'http://top.baidu.com/buzz?b=1&fr=topindex'
header={
   
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
    'Referer': 'http://top.baidu.com/'
}

r = requests.get(url,headers=header)
r.encoding='gbk'
#print(r.text)
selector = etree.HTML(r.text)

# 二次检索
eles = selector.xpath('//td[@class="keyword"]/a[1]')
#print(len(eles))

2-通过二次检索的方式把我们需要的字段值一一获取

ls =[]
for ele in eles:
    #print(index+1)
    # 百度热搜主题
    title = ele.xpath('./text()')[0]
    #print(title)
    # 百度热搜主题链接
    url = ele.xpath('./@href')[0]
    #print(url)
    crawled_time = datetime.now()
    temp_ls = {
   }
    
    temp_ls['title'] =title
    temp_ls['url'] = url
    temp_ls['crawled_time'] = str(crawled_time)
    ls.append(temp_ls)
   
#print(ls)

data_email =''
for index , email_ls in

最低0.47元/天解锁文章

南巷的花猫

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
爬虫实例1:爬取百度热搜风云榜前50条热搜并将热搜发送至自己邮箱

1-利用requests库以及xpath 获取百度热搜风云榜的字段如:标题title 热搜urlurl = 'http://top.baidu.com/buzz?b=1&fr=topindex'header={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like...
复制链接

扫一扫