爬虫入门--爬取百度贴吧图片

提笔惊蚂蚁

于 2024-10-09 05:00:00 发布

阅读量108

点赞数 1

文章标签：爬虫改行学it 程序人生职场和发展 python pycharm 学习

本文链接：https://blog.csdn.net/JR521314/article/details/142769942

版权

百度贴吧图片爬取流程:

1. 通过requests拿到网页的源代码数据
2. 通过lxml对源代码进行解析, 拿到图片url地址
3. 根据图片url地址, 循环拉取图片
4. 将图片写入图片文件

# 1.
index_url = 'https://tieba.baidu.com/p/5475267611'
responses= requests.get(index_url).text  

# 2.
selector= etree.HTML(responses)    
image_urls = selector.xpath('//img[@class="BDE_Image"]/@src')

# 3.根据图片地址, 循环拉取图片
offset = 0
for image_url in image_urls:
    image_content= requests.get(image_url).content
    # 依次将图片写入文件
    with open('{}.jpg'.format(offset),'wb') as f:
        f.write(image_content)
    offset+=1