2021-08-09

最新推荐文章于 2024-10-14 09:08:48 发布

moon__________

最新推荐文章于 2024-10-14 09:08:48 发布

阅读量73

点赞数

分类专栏： Py爬虫文章标签： python 爬虫

本文链接：https://blog.csdn.net/moon__________/article/details/119549051

版权

Py爬虫专栏收录该内容

3 篇文章 0 订阅

订阅专栏

爬虫训练网站

爬虫

基础爬虫训练

基础爬虫训练

import urllib.request
respone = urllib.request.urlopen("http://www.baidu.com")
print(respone.read().decode('utf-8'))

超时处理

try:
    respone = urllib.request.urlopen("http://www.baidu.com",timeout=0.1)
except urllib.error.URLError as chaoshi:
    print("超时!")		#超过0.1s打印超时

欺骗服务器
在这里插入图片描述
418表示不让爬虫

import urllib.request
url = "http://httpbin.org/get"
respone = urllib.request.urlopen(url)
print(respone.read().decode('utf-8'))

返回
在这里插入图片描述
上面User-Agent 直接表明我们是Python，所以我们可以用Post传递表单，将User-Agent设置为浏览器访问时的用户代理

import urllib.parse,urllib.request
url = "http://httpbin.org/post"
headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) ..."

}
#数据data在form中体现
data = bytes(urllib.parse.urlencode({"name":"123"}),encoding="utf-8")
#设置请求对象
res = urllib.request.Request(url = url, data = data, headers = headers, method= "POST")
respone = urllib.request.urlopen(res)
print(respone.read().decode('utf-8'))