Python3之利用requests和BeautifulSoup抓取部分信息

最新推荐文章于 2021-07-15 23:59:42 发布

Quincy379

最新推荐文章于 2021-07-15 23:59:42 发布

阅读量2.5k

点赞数 1

分类专栏： Python 文章标签： python url c语言图片 os

本文链接：https://blog.csdn.net/qq_33733970/article/details/77822282

版权

Python 专栏收录该内容

271 篇文章 9 订阅

订阅专栏

import requests
import os
from bs4 import BeautifulSoup

imgPath = r'D:\Users\Quincy_C\PycharmProjects\S6\bs模块\汽车图片'
response = requests.get(url='http://www.autohome.com.cn/news/')
response.encoding = response.apparent_encoding
bs = BeautifulSoup(response.text, features='html.parser')
bs_obj = bs.find(id="auto-channel-lazyload-article")
li_list = bs_obj.find_all('li')
for i in li_list:
    a = i.find('a')
    if a:
        txt = a.find('h3').text
        print(a.find('img').attrs.get('src'))
        # requests.get('url').content返回的是字节
        imgContent = requests.get(a.find('img').attrs.get('src')).content
        import uuid
        if not os.path.isdir(imgPath):
            os.mkdir(imgPath)
        else:
            imgUrl = str(uuid.uuid4()) + '.jpg'
            with open(os.path.join(imgPath, imgUrl), 'wb') as f:
                f.write(imgContent)

如果要讲图片存放在指定的文件夹，可以这样：

            with open(os.path.join(imgPath, imgUrl), 'wb') as f:
                f.write(imgContent)

或者：

os.chdir(imgPath)

都可以的，之前搞过，忘记了。记录一下！
总结一下：

requests

requests.get(‘url’,headers=headers)发送一个请求

response.encoding = response.apparent_encoding指定编码

requests.get(‘url’).text获取网页内容

requests.get(‘url’).content获取图片的字节

BeautifulSoup

bs = BeautifulSoup(requests.get(‘url’).text,features=’html.parser’)

bs.find(‘div’,id=”)

bs.find_all(‘div’,id=”)

bs.find_all(‘div’,class=”)

a.attrs获取一个字典

a.ttrs.get(”)获取具体的内容

Quincy379

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python3之利用requests和BeautifulSoup抓取部分信息

import requestsimport osfrom bs4 import BeautifulSoupimgPath = r'D:\Users\Quincy_C\PycharmProjects\S6\bs模块\汽车图片'response = requests.get(url='http://www.autohome.com.cn/news/')response.encoding = re
复制链接

扫一扫

专栏目录