爬虫_requests_html

最新推荐文章于 2024-05-27 09:45:56 发布

zk仔的博客

最新推荐文章于 2024-05-27 09:45:56 发布

阅读量159

点赞数

分类专栏： python_爬虫

本文链接：https://blog.csdn.net/weixin_39532362/article/details/98516398

版权

python_爬虫专栏收录该内容

14 篇文章 0 订阅

订阅专栏

爬虫_requests_html

安装
基本使用
生成html
script支持

安装

pip install requests_html

基本使用

# 导入
from requests_html import HTMLSession,HTML

# 生成会话
session = HTMLSession()

# 设置头部
headers={
    'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
}

# 请求
url = "https://www.baidu.com"
response = session.get(url=url,headers=headers)

# post请求
session.post('http://httpbin.org/post', data={'name': 'zzz', 'passwd': 123})

# 页面内容
response.html.html

# 所有连接
# 相对
for _ in response.html.links:
    print(_)
# 绝对
for _ in response.html.absolute_links:
    print(_)

# css选择器
response.html.find('#some a',first=True,_encoding='utf-8', clean=false) # clean忽略style,script

# xpath选择器
response.html.xpath('//[@id="some"]/a')

# 文本
ele.text

# html
ele.html

# 元素属性
ele.attrs.get('id')

# 连接属性
e.links  # 相对路径
e.absolute_links # 绝对路径

生成html

doc = """
    <!DOCTYPE html>
    <html>
        hello
    </html>
"""
html = HTML(html=doc)

script支持

res = session.get('http://python-requests.org/')
res.html.render()
# retries:失败次数; 
# wait:加载前等待时间; 
# scrolldown:下滚次数;
# sleep:初次渲染后等待时间;
# reload:是否浏览器加载
# keep_page:是否允许rp.html.page访问页面

script ='''
    console.log('script')
'''
html.render(script=script)

zk仔的博客

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
爬虫_requests_html

爬虫_requests_html安装基本使用生成htmlscript支持安装pip3 install requests_html基本使用# 导入# from requests_html import HTMLSession,HTML# 生成会话session = HTMLSession()# 设置头部headers={ 'User-Agent':'Mozilla/5...
复制链接

扫一扫