urllib模块

1. 打开URL,读取返回的数据

#!/usr/bin/env python

from urllib import request

url = 'https://www.jd.com'
req = request.urlopen(url)
res = req.read()
print(res.decode('utf-8'))

 

2. 对post数据进行编码

#!/usr/bin/env python

from urllib import request
from urllib import parse

url = 'http://httpbin.org/post'
payload = {'key1': 'value1', 'key2': 'value2'}
newpayload = parse.urlencode(payload).encode('utf-8')  #post的数据必须是bytes或者iterable of bytes,不能是str,因此需要进行encode()编码
print(type(newpayload))                                #encode就是把字符串转换成字节
req = request.urlopen(url, data=newpayload)
res = req.read()
print(res.decode('utf-8'))                            #decode就是把字节转换成字符串

--------------------------------------------------------------------------------------->
<class 'bytes'>
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "key1": "value1",
    "key2": "value2"
  },
  "headers": {
    "Accept-Encoding": "identity",
    "Connection": "close",
    "Content-Length": "23",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "Python-urllib/3.6"
  },
  "json": null,
  "origin": "183.48.35.148",
  "url": "http://httpbin.org/post"
}

 

  1. 构造请求头
#!/usr/bin/env python

from urllib import request

url = 'https://www.qiushibaike.com/'
ua = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"}

req = request.Request(url, headers=ua)
res = request.urlopen(req)
print(res.read().decode('utf-8'))

 

4. 下载

#!/usr/bin/env python

from urllib import request
url = 'https://www.baidu.com/img/bd_logo1.png'
request.urlretrieve(url, filename='baidu.png')

 

也可以自己写下载方法
import codecs
from urllib import request

url = 'https://www.baidu.com/img/bd_logo1.png'

req = request.urlopen(url)
res = req.read()
with codecs.open('1.png', 'wb') as fd:
    fd.write(res)

 

5. 代理

由于默认的opener(也就是urlopen(),它是opner的一个实例)只能携带get或者post的参数访问url,和设置过期时间,所以不能用于代理,也无法携带cookie跨域请求
所以要自行构造一个支持代理的opener
#!/usr/bin/env python

from urllib import request

url = 'http://2017.ip138.com/ic.asp'
print(request.urlopen(url).read().decode('gb2312'))

dic = {'http': '125.88.177.128:3128'}
proxy = request.ProxyHandler(dic)             //创建一个有代理功能的handler
opener = request.build_opener(proxy)          //用handler创建opener
req = opener.open(url)                        //用opener访问url
res = req.read().decode('gb2312')
print(res)

----------------------------------------------------------->
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=gb2312">
<title> 您的IP地址 </title>
</head>
<body style="margin:0px"><center>您的IP是:[113.68.17.83] 来自:广东省广州市 电信</center></body></html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=gb2312">
<title> 您的IP地址 </title>
</head>
<body style="margin:0px"><center>您的IP是:[119.129.229.185] 来自:广东省广州市 电信</center></body></html>

 

6. 构造能携带cookie的opener

在python2中,创建cookie对象是使用cookielib模块
在python3中,创建cookie对象是使用http.cookiejar模块
#!/usr/bin/env python

import http.cookiejar
from urllib import request

url = 'https://www.github.com'
cookie = http.cookiejar.CookieJar()                  //创建cookie对象
print(cookie)
handler = request.HTTPCookieProcessor(cookie)        //创建能保存cookie的handler
opener = request.build_opener(handler)               //用handler创建opener
req = opener.open(url)                               //用opener访问url,访问完cookie对象就有值了
print(cookie)

---------------------------------------------------------->
<CookieJar[]>
<CookieJar[<Cookie logged_in=no for .github.com/>, <Cookie _gh_sess=eyJzZXNzaW9uX2lkIjoiYzJkNzE0NTA1OWQ4ZDc5MDA1NjM4NWI1ZDIwYjkxNTgiLCJsYXN0X3JlYWRfZnJvbV9yZXBsaWNhcyI6MTUxNjU0MDEzNzE0MCwiX2NzcmZfdG9rZW4iOiJOdTVMYXZjT0xMNDRYS1JmOTh3MFZNQnI0c3ZPOTNCWkFWRUZnN21yUFZ3PSJ9--c47838621e21bba5d72c7050345b40a9c513335a for github.com/>]>

 

7. 保存cookie信息到文件中

MozillaCookieJar() 这个类继承了FileCookieJar() 这个类, FileCookieJar()的构造函数定义了一个filename参数,也就是存储cookie的文件,而能获取cookie 是因为FileCookieJar() 这个类继承了CookieJar() 这个类

#!/usr/bin/env python

import http.cookiejar
from urllib import request

url = 'https://www.github.com'
filename = 'cookie.txt'
cookie = http.cookiejar.MozillaCookieJar(filename)      //实例化的时候传入filename参数,得到cookie对象
handler = request.HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
res = opener.open(url)
cookie.save()

 

8. 从文件中读取cookie并访问url

#!/usr/bin/env python

import http.cookiejar
from urllib import request

filename = 'cookie.txt'
cookie = http.cookiejar.MozillaCookieJar(filename)
cookie.load(filename)
url = 'https://www.github.com'
handler = request.HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
res = opener.open(url).read()
print(res.decode('utf-8'))

 

转载于:https://www.cnblogs.com/tobeone/p/8364843.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值