python学习笔记1

最新推荐文章于 2024-04-22 13:40:15 发布

xueliuhui

最新推荐文章于 2024-04-22 13:40:15 发布

阅读量209

点赞数 1

本文链接：https://blog.csdn.net/xueliuhui/article/details/78355688

版权

HTTP代理介绍

代理服务器是介于浏览器和web服务器之间，通常是浏览器向web服务器发出请求，使用代理服务器后，浏览器先向代理服务器发出请求，然后代理服务器再向web服务器取回请求的资源。

HTTP代理本质上是一个Web应用，它和其他普通Web应用没有根本区别。HTTP代理收到请求后，根据Header中Host字段的主机名和Get/POST请求地址综合判断目标主机，建立新的HTTP请求并转发请求数据，并将收到的响应数据转发给客户端。

Proxy（代理）的设置

import urllib.request

创建一个代理处理器

proxy_handler = urllib.request.ProxyHandler({'http': '代理服务器地址'})

创建一个opener，python在打开一个url链接时，就会使用opener。其实，urllib.request.urlopen()函数实际上是使用的是默认的opener，只不过在这里我们需要定制一个opener来指定handler。

opener = urllib.request.build_opener(proxy_handler)

使用这个opener打开要进入的网页URL

（1）r = opener.open('https://www.zhihu.com/#signin')
print(r.read())

备注：（1）处也可以按照如下的方法，install_opener是创建一个全局默认的opener，这样会将程序默认的urlopen方法替换掉。

urllib.request.install_opener(opener)

response = urllib.request.urlopen(url)

cookie的使用

cookie是指某些网站为了辨别用户身份和session追踪而存储在用户本地终端上的数据。

获取cookie打印出来

import http.cookiejar
import urllib
#声明一个CookieJar对象实例来保存cookie
cookie = http.cookiejar.CookieJar()
#利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器
handler=urllib.request.HTTPCookieProcessor(cookie)
#通过handler来构建opener
opener = urllib.request.build_opener(handler)
#此处的open方法同urllib2的urlopen方法，也可以传入request
response = opener.open('http://www.baidu.com')
for item in cookie:
    print('Name = '+item.name)
    print ('Value = '+item.value)

注意在python3版本中cookiejar被移到了http.cookiejar中。

保存cookie为文件

CookieJar是基类，接着是FileCookieJar。然后是两个子类MozillaCookieJar和LWPCookieJar。MozillaCookieJar和LWPCookieJar类都是FileCookieJar的子类。实现了具体的把cookie内容保存为文件的方法。只是这两个类对应的标准不同而已。

使用MozillaCookieJar保存cookie。

import http.cookiejar, urllib.request
filename = "cookie.txt"
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)

ignore_discard的意思是即使cookies将被丢弃也将它保存下来；ignore_expires的意思是如果在该文件中 cookies已经存在，则覆盖原文件写入。在这里，我们将这两个全部设置为True。

使用LWPCookieJar保存cookie

import http.cookiejar, urllib.request
filename = 'cookie.txt'
cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)

读取cookie中的内容

import http.cookiejar, urllib.request
cookie = http.cookiejar.LWPCookieJar()
cookie.load('cookie.txt', ignore_discard=True, ignore_expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com')
print(response.read().decode('utf-8'))

利用之前读取到保存的cookie文件登陆目标网站。

首先创建cookiejar.LWPCookieJar实例对象，然后读取cookie文件到变量，再创建一个自己的带有cookie的opener
注意：如果使用MozillaCookieJar保存的cookie，则要在登陆时要保持一致。

xueliuhui

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python学习笔记1

Proxy（代理）的设置import urllib.requestproxy_handler = urllib.request.ProxyHandler({'http': 'http://www.baidu.com/'})opener = urllib.request.build_opener(proxy_handler)r = opener.open('https://www.zhih
复制链接

扫一扫