python urllib模块

转载 2015年11月18日 22:36:46

urllib提供了一系列用于操作URL的功能。

Get

urllib的request模块可以非常方便地抓取URL内容,也就是发送一个GET请求到指定的页面,然后返回HTTP的响应:方法是用urlopen函数,它的参数是url字符串或者是Request对象,他返回一个HTTPResponse对象
例如,对豆瓣的一个URLhttps://api.douban.com/v2/book/2129650进行抓取,并返回响应:

from urllib import request

url='https://api.douban.com/v2/book/2129650'
#urlopen的参数是url字符串或者是Request对象,返回值为HTTPResponse
with request.urlopen(url) as f:
    data=f.read()
    print('Statue: ',f.status,f.reason)
    for k,v in f.getheaders():
        print('%s: %s' % (k,v))
    print('Data: ',data.decode('utf-8'))

下面是HTTPResponse对象:
An HTTPResponse instance wraps the HTTP response from the server. It provides access to the request headers and the entity body. The response is an iterable object and can be used in a with statement.

HTTPResponse.read([amt])

Reads and returns the response body, or up to the next amt bytes.

HTTPResponse.readinto(b)

Reads up to the next len(b) bytes of the response body into the buffer b. Returns the number of bytes read.

New in version 3.3.

HTTPResponse.getheader(name, default=None)

Return the value of the header name, or default if there is no header matching name. If there is more than one header with the name name, return all of the values joined by ‘, ‘. If ‘default’ is any iterable other than a single string, its elements are similarly returned joined by commas.

HTTPResponse.getheaders()

Return a list of (header, value) tuples.

HTTPResponse.fileno()

Return the fileno of the underlying socket.

HTTPResponse.msg

A http.client.HTTPMessage instance containing the response headers. http.client.HTTPMessage is a subclass of email.message.Message.

HTTPResponse.version

HTTP protocol version used by server. 10 for HTTP/1.0, 11 for HTTP/1.1.

HTTPResponse.status

Status code returned by server.

HTTPResponse.reason

Reason phrase returned by server.

HTTPResponse.debuglevel

A debugging hook. If debuglevel is greater than zero, messages will be printed to stdout as the response is read and parsed.

HTTPResponse.closed

Is True if the stream is closed.

如果我们要想模拟浏览器发送GET请求,就需要使用Request对象,通过往Request对象添加HTTP头,我们就可以把请求伪装成浏览器。例如,模拟火狐去请求Python首页:

关于Request

其中User-agent是表示浏览器

Request对象都有什么属性和方法

from urllib import request

url='https://www.python.org/'
req=request.Request(url)
req.add_header('User_agent','Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11')
with request.urlopen(req) as f:
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', f.read().decode('utf-8'))

Get模拟微博登录:

from urllib import request,parse

print('Login to weibo.cn...')

url='https://passport.weibo.cn/sso/login?username=xxxxxx&password=xxxxxx'
print(url)

req=request.Request(url)
req.add_header('Origin', 'https://passport.weibo.cn')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F')

with request.urlopen(req) as f:
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', f.read().decode('utf-8'))

Post

如果要以POST发送一个请求,只需要把参数data以bytes形式传入。

我们模拟一个微博登录,先读取登录的邮箱和口令,然后按照weibo.cn的登录页的格式以username=xxx&password=xxx的编码传入:

from urllib import request,parse

print('Login to weibo.cn...')
url='https://passport.weibo.cn/sso/login'
email=input('Email: ')
password=input('Password: ')
login_data=parse.urlencode([
    ('username',email),
    ('password',password),
    ('entry','mweibo'),
    ('client_id',''),
    ('savestate','1'),
    ('ec',''),
    ('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F')
])
req=request.Request(url)
req.add_header('Origin', 'https://passport.weibo.cn')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F')

with request.urlopen(req,data=login_data.encode('utf-8')) as f:
    print('Status:', f.status, f.reason)
    for k, v in f.getheaders():
        print('%s: %s' % (k, v))
    print('Data:', f.read().decode('utf-8'))

浅析HTTP协议
HTTP 请求方式: GET和POST的比较
http(百度百科)
HTTP协议详解

相关文章推荐

Python urllib模块

urllib模块提供的上层接口,使我们可以像读取本地文件一样读取www和ftp上的数据。先看一个例子,这个例子把Google首页的html抓取下来并显示在控制台上: import urllib ...

python的http请求模块urllib+requests

针对开发日益复(sha)杂(bi)的接口设计,一些接口工具的功能逐渐不足以满足小屌们的日常测试了,所以在这里教大家用2个模块来进行http的get、post方式请求。 urllib ...

Python的urllib2模块

Python的urllib2模块 python模块之---- urllib2模块详解  Filed under: python之旅, 模块介绍 |  Posted on 2月 17th, 200...

python urllib2模块介绍

简介: urllib2是python的一个获取url(Uniform ResourceLocators,统一资源定址器)的模块。它用urlopen函数的形式提供了一个非常简洁的接口。这使得用各种各样...

python urllib2模块学习

python urllib2 学习使用

python urllib2模块介绍

这段时间自己在学习python,这些都是自己在编写时候用到的一些比较有用的信息,主要是参考网友的,也加入了自己在实际编写时的一些总结。 简介: urllib2是python的一个获取url(Unifo...

Python中urllib模块的使用

urllib模块中的方法urllib.urlopen(url[, data[, proxies]])创建一个表示远程url的类文件对象,然后像本地文件一样操作这个类文件对象来获取远程数据。参数url表...

python-urllib模块【下载图片】

0,python中关于下载的部分总结如下: import urllib if __name__=="__main__": url = "http://www.baidu.com" ...

python urllib2模块

python urllib2模块urlopen()最常用的函数 urllib2.urlopen(url[, data[, timeout[, cafile[, capath[, cadefault...

Python学习urllib2模块使用

urllib包和urllib2包基于httplib包之上,提供高层次的抽象,用于处理url请求,urllib 和 urllib2 可以更方便地进行 HTTP GET 和 POST 等各种操作。 ...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)