【python爬虫】—图片爬取

最新推荐文章于 2024-08-03 19:27:22 发布

木叶清风666

最新推荐文章于 2024-08-03 19:27:22 发布

阅读量6.2k

点赞数 10

分类专栏： python爬虫文章标签： python 爬虫网络爬虫

本文链接：https://blog.csdn.net/qq_38734327/article/details/132586638

版权

python爬虫专栏收录该内容

9 篇文章 3 订阅

订阅专栏

图片爬取

- 需求分析
- Python实现

需求分析

从https://pic.netbian.com/4kfengjing/网站爬取图片，并保存

Python实现

获取待爬取网页

def get_htmls(pages=list(range(2, 5))):
    """获取待爬取网页"""
    pages_list = []
    for page in pages:
        url = f"https://pic.netbian.com/4kfengjing/index_{page}.html"
        response = requests.get(url)
        response.encoding = 'gbk'
        pages_list.append(response.text)
    return pages_list
get_htmls(pages=list(range(2, 5)))

获取所有图片，并下载

def get_picturs(htmls):
    """获取所有图片，并下载"""
    for html in htmls:
        soup = BeautifulSoup(html, 'html.parser')
        pic_li = soup.find('div', id='main').find('div', class_='slist').find('ul', class_='clearfix')
        image_path = pic_li.find_all('img')
        for file in image_path:
            pic_name = './practice05/' + file['alt'].replace(" ",'_') + '.jpg'
            src = file['src']
            src = f"https://pic.netbian.com/{src}"

            response = requests.get(src)

            with open(pic_name, 'wb') as f:
                f.write(response.content)
                print("图片已下载并保存为:{}".format(pic_name))
                
htmls = get_htmls(pages=list(range(2, 5)))
get_picturs(htmls)