Python爬虫实践之爬取网站图片(一)

最新推荐文章于 2023-03-01 02:00:12 发布

柳絮吹成雪

最新推荐文章于 2023-03-01 02:00:12 发布

阅读量955

点赞数 2

分类专栏： Python爬虫文章标签： Python爬虫

本文链接：https://blog.csdn.net/baixueprincess/article/details/109814986

版权

Python爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

前言

本章主要用requests，解析图片网址主要用beautiful soup

操作步骤

1.打开F12，选到network，点击Load more…按钮，可以查看network里抓到的网址
在这里插入图片描述
现在我们可以通过requests请求网页

import requests
#cookies、headers值这里就不写了
cookies = {}
headers = {}
params = {'page': '2'}

#这里是get请求，get方法带参数请求时，是params=参数字典
response = requests.get('https://github.com/topics', headers=headers, params=params, cookies=cookies)

print(response.text)

2.点击下图的小箭头，选择图中的一个图片点击，可以获得图片地址
在这里插入图片描述
根据请求到的数据用beautifulsoup 模块解析，获取图片地址

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.content, "lxml")
    pngs = soup.find("ul", {"class": "list-style-none"}).find_all("li", {"class": "py-4 border-bottom"})
    print(len(pngs))
    for each in pngs:
        png_tag = each.find("img", {"class": "rounded-1 mr-3"})
        if not png_tag:
            png_url = ""
        else:
            png_url = png_tag.get("src")
            print(png_url)

3.获取到图片地址后就可将图片保存到本地
这里我是用图片原本的图片名保存的

import urllib.request
filename = png_url.split('/')[-1]
print(filename)
urllib.request.urlretrieve(png_url, 'E://images/'+filename)

4.全部的代码如下

import requests
from bs4 import BeautifulSoup
import urllib.request

def main():
    cookies = {}
    headers = {}
    params = {'page': '2'}

    response = requests.get('https://github.com/topics', headers=headers, params=params, cookies=cookies)

    soup = BeautifulSoup(response.content, "lxml")
    pngs = soup.find("ul", {"class": "list-style-none"}).find_all("li", {"class": "py-4 border-bottom"})
    print(len(pngs))
    for each in pngs:
        png_tag = each.find("img", {"class": "rounded-1 mr-3"})
        if not png_tag:
            png_url = ""
        else:
            png_url = png_tag.get("src")
            print(png_url)
            filename = png_url.split('/')[-1]
            print(filename)
            urllib.request.urlretrieve(png_url, 'E://images/'+filename)
            # response = requests.get(png_url, stream=True)
            # with open('E://images/'+filename, 'wb') as fd:
            #     fd.write(response.content)
            #     print(filename + "download success")

if __name__ == '__main__':
    main()

柳絮吹成雪

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python爬虫实践之爬取网站图片(一)

爬取github上的图片并保存import requestsfrom bs4 import BeautifulSoupimport urllib.requestdef main(): cookies = {} headers = {} params = () response = requests.get('https://github.com/topics', headers=headers, params=params, cookies=cookies)
复制链接

扫一扫