python3学习笔记4（requests库）

最新推荐文章于 2024-10-05 18:02:32 发布

yiyang14

最新推荐文章于 2024-10-05 18:02:32 发布

阅读量152

点赞数

分类专栏：计算机文章标签： python 爬虫

计算机专栏收录该内容

11 篇文章 0 订阅

订阅专栏

本文转自http://blog.csdn.net/gyq1998/article/details/78583841

在上一篇文章中，我讲了requests库的七种使用方法

在这一篇中，我主要讲几个实例：

1、京东商品信息的爬取

不需要对头部做任何修改，即可爬网页

import requests
url='http://item.jd.com/2967929.html'
try:
    r=requests.get(url,timeout=30)
    r.raise_for_status()
    r.encoding=r.apparent_encoding 
    print(r.text[:1000]) #部分信息
except:
    print("失败"）
 
 1
2
3
4
5
6
7
8
9

2、亚马逊商品信息的爬取

该网页中对爬虫进行的爬取做了限制，因此我们需要伪装自己为浏览器发出的请求。

import requests
url='http://www.amazon.cn/gp/product/B01M8L5Z3Y'
try:
    kv={'user_agent':'Mozilla/5.0'}
    r=requests.get(url,headers=kv)#改变自己的请求数据
    r.raise_for_status()
    r.encoding=r.apparent_encoding 
    print(r.text[1000:2000]) #部分信息
except:
    print("失败"）
 
 1
2
3
4
5
6
7
8
9
10

3、百度搜索关键字提交

百度的关键字接口：
https://www.baidu.com/s?wd=keyword

import requests
keyword='python'
try:
    kv={'wd':keyword}
    r=requests.get('https://www.baidu.com/s',params=kv)
    r.raise_for_status()
    r.encoding=r.apparent_encoding 
    print(len(r.text)) 
except:
    print("失败"）
 
 1
2
3
4
5
6
7
8
9
10

4、网络图片的爬取

import requests
import os
try:
    url="http://baishi.baidu.com/watch/02167966440907275567.html"#图片地址
    root="E:/pic/"
    path=root+url.split("/")[-1]
    if not os.path.exists(root): #目录不存在创建目录
        os.mkdir(root)
    if not os.path.exists(path): #文件不存在则下载
        r=requests.get(url)
        f=open(path,"wb")
        f.write(r.content)
        f.close()
        print("文件下载成功")
    else:
        print("文件已经存在")
except:
    print("获取失败")