urllib.request模块
- 模拟浏览器构建一个简单Get请求来爬取
import urllib.request
url = 'https://httpbin.org/post'
reponse = urllib.request.urlopen(url)
print( response.read().decode('utf-8') )
- 模拟浏览器构建一个Post请求来爬取网页
import urllib.request,parse
url = 'https://httpbin.org/Post'
dict = {'name':'liu'}
str = urllib.parse.urlencode(dic)
data = bytes(str,encoding='utf-8')
try:
reponse = urllib.request.urlopen(url,data = data,timeout = 2)
print( response.read().decode('utf-8') )
except urllib.error.URLError as e:
if isinstance(e.reason, socket.timeout):
print('TIME OUT')
- 利用Request类灵活构建请求来爬取
import urllib.request
url = 'https://httpbin.org/post'
headers = {
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
'Host': 'httpbin.org'
}
dict = {'name':'liu'}
str = urllib.parse.urlencode(dic)
data = bytes(str,encoding='utf-8')
request = urllib.request.Request(url,data=data, headers=headers, method='POST')
response = urllib.request.urlopen(request)
print( response.read().decode('utf-8') )
其他相关文章推荐
- 爬虫学习——爬虫学习——Robots协议和 robotparser模块
- 一文速通的正则表达式
- python中使用正则表达式——为所欲为
- URL详细分析及在python中处理URL
- 爬虫实战(1)——小试牛刀