03Python爬虫---延时以及GET和POST请求

最新推荐文章于 2024-03-21 16:36:22 发布

冰彡棒

最新推荐文章于 2024-03-21 16:36:22 发布

阅读量8.8k

点赞数 1

分类专栏： python爬虫文章标签：爬虫 GET POST timeout python

本文链接：https://blog.csdn.net/a877415861/article/details/79005161

版权

python爬虫专栏收录该内容

28 篇文章 0 订阅

订阅专栏

一、超时设置

import urllib.request
# 如果想网站不出现超时异常就可以将timeout设置时间延长
for i in range(1, 100):  # 循环99次

    try:

        file = urllib.request.urlopen("http://yum.iqianyue.com",timeout=1)  # 超时设置为1s

        data = file.read()

        print(len(data))

    except Exception as e:

        print("出现异常-->"+str(e))

二、GET请求

import urllib.request

keywd = 'hello'

url = 'http://www.baidu.com/s?wd='+keywd

req = urllib.request.Request(url)  # 构建一个Request对象

data = urllib.request.urlopen(req).read()  # 打开对应的对象

fhandle = open("/home/zyb/crawler/myweb/part4/4.html", "wb")

fhandle.write(data)

fhandle.close()

注意：需要优化的地方关键词为中文时，则会报错UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 10-11: ordinal not in range(128)

优化

url = 'http://www.baidu.com/s?wd='

key = "有道"

key_code = urllib.request.quote(key)  # 对关键词部分进行编码

url_all = url+key_code

req = urllib.request.Request(url_all)  # 构建一个Request对象

data = urllib.request.urlopen(req).read()  # 打开对应的对象

fhandle = open("/home/zyb/crawler/myweb/part4/5.html", "wb")

fhandle.write(data)

fhandle.close()

注意：
1. 必须为GET请求
2. 以URL为参数构建Request对象
3. 通过urlopen()打开构建的Request对象

三、POST请求

我们以www.iqianyue.com网站为例
爬取思路:
1. 设置好URL地址
2. 构建表单数据,通过urllib.parse.urlencode对数据进行编码处理
3. 创建Request对象，参数包括URL和传递的数据
4. 使用add_header()添加头信息，模拟浏览器爬取
5. 使用urllib.request.urlopen()打开对象Request，完成信息的传递
6. 后续处理

import urllib.parse

url = "http://www.iqianyue.com/mypost/"

postdata = urllib.parse.urlencode({
    'name': "zhouyanbing",
    'pass': "zyb1121"
}).encode('utf-8')  # 将数据使用urlencode编码处理后要使用encode设置为utf-8编码

req = urllib.request.Request(url,postdata)

req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")

data = urllib.request.urlopen(req).read()

fhandle = open("/home/zyb/crawler/myweb/part4/6.html", "wb")

fhandle.write(data)

fhandle.close()

冰彡棒

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
03Python爬虫---延时以及GET和POST请求

一、超时设置import urllib.request# 如果想网站不出现超时异常就可以将timeout设置时间延长for i in range(1, 100): # 循环99次 try: file = urllib.request.urlopen("http://yum.iqianyue.com",timeout=1) # 超时设置为1s
复制链接

扫一扫