爬虫基础

最新推荐文章于 2024-07-05 23:20:04 发布

almost_Mr

最新推荐文章于 2024-07-05 23:20:04 发布

阅读量409

点赞数

文章标签：爬虫

本文链接：https://blog.csdn.net/almost_Mr/article/details/54347455

版权

简单爬虫

官方文档
HTTP是基于请求和应答机制的–客户端提出请求，服务端提供应答。urllib2用一个Request对象来映射你提出的HTTP请求,在它最简单的使用形式中你将用你要请求的地址创建一个Request对象，通过调用urlopen并传入Request对象，将返回一个相关请求response对象，这个应答对象如同一个文件对象，所以你可以在Response中调用.read()

import urllib2
import urllib

response=urllib.urlopen("https://www.douban.com/")
print response.read()

或者
import urllib2
import urllib

req = urllib2.Request('http://www.douban.com/')  #创建request对象
response=urllib2.urlopen(req)                    #调用urlopen方法，传入request对象，返回相关请求的response对象
print response.read()

爬虫伪装

浏览器-检查-network-XHR-request headers-User-Agent
import urllib2
import urllib

url='http://www.douban.com/'
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36' 
headers={ "User-Agent":user_agent}

req = urllib2.Request(url,headers=headers)
response=urllib2.urlopen(req)
print response.read()

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

almost_Mr

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
爬虫基础

简单爬虫import urllib2import urllibresponse=urllib.urlopen("https://www.douban.com/")print response.read()爬虫伪装浏览器-检查-network-XHR-request headers-User-Agent
复制链接

扫一扫