1、如果要想模拟浏览器发送get请求,就要使用Request对象,通过Request对象添加HTTP头,就可以伪装成浏览器。
from urllib impor request
req=request.Request("http://www.bnaid.com")
req.add_header('User_Agent',, 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25'
)
with request.urlopen(req)as f:
print("Status",f.status, f.reason)
for k,v in f.getheaders():
print("Data", f.read().decode('utf-8'))
2、如果发送的是post请求只需要把参数data以bites形式传入即可
from urllib import request, parse
print('Login to weibo.cn...')
email = input('Email: ')
passwd = input('Password: ')
login_data = parse.urlencode([
('username', email),
('password', passwd),
('entry', 'mweibo'),
('client_id', ''),
('savestate', '1'),
('ec', ''),
('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F')
])
req = request.Request('https://passport.weibo.cn/sso/login')
req.add_header('Origin', 'https://passport.weibo.cn')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F')
with request.urlopen(req, data=login_data.encode('utf-8')) as f:
print('Status:', f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
print('Data:', f.read().decode('utf-8'))
3、如果还有需要更加复杂的控制,通过Proxy 访问网站,就要利用procyHandler来处理。
from urllib import request, parse
# print('Login to weibo.cn...')
# email = input('Email: ')
# passwd = input('Password: ')
login_data = parse.urlencode([
# ('username', email),
# ('password', passwd),
('entry', 'mweibo'),
('client_id', ''),
('savestate', '1'),
('ec', ''),
('pagerefer', 'http://www.douban.com/')
])
#
req = request.Request('http://www.douban.com/')
# req.add_header('Origin', 'https://passport.weibo.cn')
# req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
# req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F')
proxy_handler = request.ProxyHandler({'http': 'http://www.douban.com/'})
proxy_auth_handler = request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
# opener = request.build_opener(proxy_handler, proxy_auth_handler)
# with opener.open('http://www.example.com/login.html') as f:
# # pass
with request.urlopen(req, data=login_data.encode('utf-8')) as f:
print('Status:', f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
print('Data:', f.read().decode('utf-8'))
4、XML虽然比JSON复杂,在web中使用的比以前少了,操作XML使用DOM或者SAX,DOM会把整个XML读入到内存当中,因此占用的内存较大,即系慢,但是优点是可以任意的遍历输的所有节点,SAX是流模式,边读边解析,占用的内存下。一般情况下先采用SAX
在python中解析XML通常只关心三个事件 start_element, end_element 和char_data
5、GitHub命令笔记整理
git config -l 查看当前git配置详细信息
查看不同级别的配置
查看用户信息
绑定你自己的信息
创建一个Git代码库文件
克隆远程仓库到自己电脑仓库
查看文件的状态是否改变
添加文件到暂存区
查看是否提交到仓库
将文件移除暂存区后,查看状态