python爬虫selenium和bs4_python爬虫之requests+selenium+BeautifulSoup

最新推荐文章于 2021-11-18 11:56:23 发布

weixin_39605278

最新推荐文章于 2021-11-18 11:56:23 发布

阅读量163

点赞数

文章标签： python爬虫selenium和bs4

前言：

环境配置：windows64、python3.4

requests库基本操作：

1、安装：pip install requests

2、功能：使用 requests 发送网络请求，可以实现跟浏览器一样发送各种HTTP请求来获取网站的数据。

3、命令集操作：

import requests # 导入requests模块

r = requests.get("https://api.github.com/events") # 获取某个网页

# 设置超时，在timeout设定的秒数时间后停止等待响应

r2 = requests.get("https://api.github.com/events", timeout=0.001)

payload = {'key1': 'value1', 'key2': 'value2'}

r1 = requests.get("http://httpbin.org/get", params=payload)

print(r.url) # 打印输出url

print(r.text) # 读取服务器响应的内容

print(r.encoding) # 获取当前编码

print(r.content) # 以字节的方式请求响应体

print(r.status_code) # 获取响应状态码

print(r.status_code == requests.codes.ok) # 使用内置的状态码查询对象

print(r.headers) # 以一个python字典形式展示的服务器响应头

print(r.headers['content-type']) # 大小写不敏感，使用任意形式访问这些响应头字段

print(r.history) # 是一个response对象的列表

print(type(r)) # 返回请求类型

BeautifulSoup4库基本操作：

1、安装：pip install BeautifulSoup4

2、功能：Beautiful Soup 是一个可以从HTML或XML文件中提取数据的Python库。

3、命令集操作：

1 importrequests2 from bs4 importBeautifulSoup3 html_doc = """

4

The Dormouse's story5 6

The Dormouse's story

7

8

Once upon a time there were three little sisters; and their names were9 Elsie,10 Lacie and11 Tillie;12 and they lived at the bottom of a well.

13

14

...

15 """

16

17 ss = BeautifulSoup(html_doc,"html.parser")18 print (ss.prettify()) #按照标准的缩进格式的结构输出

19 print(ss.title) #

The Dormouse's story

20 print(ss.title.name) #title

21 print(ss.title.string) #The Dormouse's story

22 print(ss.title.parent.name) #head

23 print(ss.p) #

The Dormouse's story

24 print(ss.p['class']) #['title']

25 print(ss.a) #Elsie

26 print(ss.find_all("a")) #[。。。]

29 print(ss.find(id = "link3")) #Tillie

30

31 for link in ss.find_all("a"):32 print(link.get("link")) #获取文档中所有标签的链接

33

34 print(ss.get_text()) #从文档中获取所有文字内容

1 importrequests2 from bs4 importBeautifulSoup3

4 html_doc = """

5

The Dormouse's story6 7

The Dormouse's story

8

Once upon a time there were three little sisters; and their names were9 Elsie,10 Lacie and11 Tillie;12 and they lived at the bottom of a well.

13

14

...

15 """

16 soup = BeautifulSoup(html_doc, 'html.parser') #声明BeautifulSoup对象

17 find = soup.find('p') #使用find方法查到第一个p标签

18 print("find's return type is", type(find)) #输出返回值类型

19 print("find's content is", find) #输出find获取的值

20 print("find's Tag Name is", find.name) #输出标签的名字

21 print("find's Attribute(class) is", find['class']) #输出标签的class属性值

22

23 print(find.string) #获取标签中的文本内容

24

25 markup = ""

26 soup1 = BeautifulSoup(markup, "html.parser")27 comment =soup1.b.string28 print(type(comment)) #获取注释中内容

小试牛刀：

1 importrequests2 importio3 importsys4 sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') #改变标准输出的默认编码

5

6 r = requests.get('https://unsplash.com') #像目标url地址发送get请求，返回一个response对象

7

8 print(r.text) #r.text是http response的网页HTML

参考链接：

https://blog.csdn.net/u012662731/article/details/78537432

http://www.cnblogs.com/Albert-Lee/p/6276847.html

https://blog.csdn.net/enohtzvqijxo00atz3y8/article/details/78748531

weixin_39605278

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫selenium和bs4_python爬虫之requests+selenium+BeautifulSoup

前言：环境配置：windows64、python3.4requests库基本操作：1、安装：pip install requests2、功能：使用 requests 发送网络请求，可以实现跟浏览器一样发送各种HTTP请求来获取网站的数据。3、命令集操作：import requests # 导入requests模块r = requests.get("https://api.github.com/e...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。