![](https://img-blog.csdnimg.cn/20201014180756922.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
BeautifulSoup
文章平均质量分 92
Mr雪候鸟
这个作者很懒,什么都没留下…
展开
-
爬取豆瓣top250及评论
# -*- coding:utf-8 -*-# author: MrLuoimport requestsfrom bs4 import BeautifulSoupfrom lxml import etreeimport randomheaders = [ {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0'}, {'User-A原创 2020-12-28 18:03:22 · 292 阅读 · 1 评论 -
BeautifulSoup爬取指定类div标签下的网址href
测试案例:1.select方法 for item in soup.select('div[class="f-l intern-detail__job"] p a'): detail_url = item.get('href') print(detail_url)2.find_all 方法 for items in soup.find_all('div',class_='f-l intern-detail__job'): item = items.原创 2020-12-22 22:48:52 · 7367 阅读 · 1 评论 -
BeautifulSoup学习笔记
# -*- coding:utf-8 -*-import requestsfrom bs4 import BeautifulSoup# 发出请求获得HTML源码的函数def get_html(url): # 伪装成浏览器访问 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3原创 2020-12-22 17:10:33 · 81 阅读 · 0 评论