爬虫urllib使用

最新推荐文章于 2024-01-12 10:49:52 发布

�

最新推荐文章于 2024-01-12 10:49:52 发布

阅读量129

点赞数 1

分类专栏： python 文章标签：爬虫 urllib requset parse head'er's

本文链接：https://blog.csdn.net/weixin_44185953/article/details/85136195

版权

python 专栏收录该内容

21 篇文章 1 订阅

订阅专栏

爬虫urllib使用

request 和parse使用

request 和parse使用

from urllib import request
#例如爬取 百度首页
#直接爬取 https://www.baidu.com/ 
html_obj=request.urlopen("https://www.baidu.com/ ")
#然后读取爬取的内容 并以utf-8转码
html_content=html_obj.read().decode("utf-8")
print(html_content)
#发现内容几乎为空,所以我们需要伪装浏览器
#在请求头上面给点伪装信息 User-Agent 浏览器标识
headers={
	"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
}
#封装
url="https://www.baidu.com/ "
req=request.Request(url=url,headers=headers)
#再发起请求
html_second=request.urlopen(req)
html_content_second=html_second.read().decode("utf-8")
print(html_content_second)

提交爬取

from urllib import request,parse

base_url="https://tieba.baidu.com/f?"
headers={
         "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/70.0.3538.110 Safari/537.36",
    }

kw=input("输入：")

#这是要提交的数据
data={
        "ie":"utf-8",
        "kw":kw,
        "fr":"search",
    }

#使用parse.urlencode() 对提交的数据进行转换
data_str=parse.urlencode(data)
#这个是get提交所以直接拼接在url上面
url=base_url+data_str

req=request.Request(url=url,headers=headers)
html=request.urlopen(req).read().decode("utf-8")
#创建文件名
file_name="%s.html"%(kw)
#把html代码写进文件
with open(file_name,"w",encoding="utf-8") as f:
    f.write(html)

这就是简单的urllib的使用，后续继续更新！

�

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
爬虫urllib使用

爬虫urllib使用request 和parse使用request 和parse使用from urllib import request#例如爬取百度首页#直接爬取 https://www.baidu.com/ html_obj=request.urlopen("https://www.baidu.com/ ")#然后读取爬取的内容并以utf-8转码html_content=ht...
复制链接

扫一扫