urllib:
url ="http://www.csdn.net"
ff = urllib.urlopen(url)(url须为绝对地址)
不能修改header
ff.geturl() 获取访问的URL(string)
ff.headers 获取回应报头(string)
ff.info() 获取回应报头(string)
urllib2:
ff = urllib2.urlopen(url)(同urllib,(url须为绝对地址))
另一种方式(可修改header,有时网站会屏蔽不是浏览器的爬虫请求,需要修改header,加入user-agent等):
reqheaders={'':'','':''}
request = urllib2.Request(url,headers=reqheaders)
respon = urllib2.urlopen(request)
respon.info()/respon.headers()/info()(类似urllib)
httplib:
conn = httplib.HTTPConnection(url)(url可不必为绝对地址)重定向会停止
conn = httplib.HTTPSConnection(url)(url可不必为绝对地址)
conn.request('GET', '/')
res = conn.getresponse()获取网站回应
header = res.getheaders()获取报头(list)
res.getheaders(name,)字典查询特定name的报头
httplib2:
http = httplib2.Http()
#http.request(uri, method, body, headers, redirections, connection_type)
response, content = http.request(url2)
response 为headers字典,content为内容string
可以不区分HTTPS或者http,会自动重定向,追踪到最后的页面。