添加浏览器请求头下载网页并打印出网页html代码
from urllib import request
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')
req = request.Request("http://www.baidu.com")
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36")
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))
POST请求
1.导入库 from urllib import parse
2.使用urlencode生成post数据
postData = parse.urlencode([
(key,val1),
(key,val2),
(key,valn)
])
3.使用postData发送post请求
urlopen(req, data=postData.encode(‘utf-8’))
4.得到请求状态
resp = urlopen(req, data=postData.encode(‘utf-8’))
测试响应数据:
Host:www.thsrc.com.tw
Origin:http://www.thsrc.com.tw
Referer:http://www.thsrc.com.tw/index.html?force=1
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36
Form Data
view source
view URL encoded
post数据:
StartStation:2f940836-cedc-41ef-8e28-c2336ac8fe68
EndStation:977abb69-413a-4ccf-a109-0272c24fd490
SearchDate:2017/02/12
SearchTime:10:30
SearchWay:DepartureInMandarin
RestTime:
EarlyOrLater:
爬虫代码
from urllib.request import urlopen
from urllib.request import Request
from urllib import parse
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')
req = Request("http://www.thsrc.com.tw/tw/TimeTable/SearchResult")
postData = parse.urlencode([
("StartStation", "2f940836-cedc-41ef-8e28-c2336ac8fe68"),
("EndStation", "977abb69-413a-4ccf-a109-0272c24fd490"),
("SearchDate", "2017/02/12"),
("SearchTime", "10:30"),
("SearchWay", "DepartureInMandarin")
])
req.add_header("Origin", "http://www.thsrc.com.tw")
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36")
resp = urlopen(req, data=postData.encode('utf-8'))
print(resp.read().decode('utf-8'))