python爬虫urllib 数据处理_Python 爬虫笔记之Urllib的用法

最新推荐文章于 2024-04-12 20:53:36 发布

weixin_39674414

最新推荐文章于 2024-04-12 20:53:36 发布

阅读量127

点赞数

文章标签： python爬虫urllib 数据处理

urllib总共有四个子模块,分别为request,error,parse,robotparser

request用于发送request(请求)和取得response(回应)

error包含request的异常,通常用于捕获异常

parse用于解析和处理url

robotparser用于robot.txt文件的处理

urllib.request 模块import urllib.request

response=urllib.request.urlopen("http://blog.youhaiqun.mom")

print(response.read().decode('utf-8'))

response是一个Httpresponse对象,它主要包含的方法有 read()

getheader(name),getheaders(),fileno()等函数

主要包含的属性为status,msg,reason,closed,debuglevel

可以利用response.status,或response.read()来调用并获取信息

urllib.request.urlopen()模块urllib.request.urlopen(url,data,timeout,cafile,capath,cadefault,context)

利用URLopen打开url所对应的网址,data为附加参数,其必须为bytes型,(可以利用data来进行post方式的访问)

urllib.parse.urlencode()模块urllib.parse.urlencode({'word':'hello'})

可以把字典转化为字符串

同时利用上面两个模块

data={'word':'hello'}

data=bytes(urllib.parse.urlencode(data),encoding='utf-8')

response=urllib.request.urlopen('http://blog.youhaiqun.mom',data,timeout=9)

urllib.request.Request()模块

当需要在请求中加入header时就需要用到urllib.request.Request(),urllib.request.urlopen()只能利用data来传递附加的参数

request=urllib.request.Request(url,data,headers,method='get/post')

注意: 上面并没有开始对url进行请求,只是构造了一个request,里面包含的headers,data等数据,需要经过下面的语句才算正式开始访问

response=urllib.request.urlopen(request)

print(response.read().decode('utf-8'))

也可以通过add_header()来添加headers

request=urllib.request.Request(url,data,method='POST')

request.add_header('User-Agent','Mozilla/4.0(compatible;MSIE 5.5;Windows NT)')

urllib.request.Request的高级特征

对于cookie,代理的处理`

weixin_39674414

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。