百度贴吧图片爬取流程:
1. 通过requests拿到网页的源代码数据
2. 通过lxml对源代码进行解析, 拿到图片url地址
3. 根据图片url地址, 循环拉取图片
4. 将图片写入图片文件
# 1. index_url = 'https://tieba.baidu.com/p/5475267611' responses= requests.get(index_url).text # 2. selector= etree.HTML(responses) image_urls = selector.xpath('//img[@class="BDE_Image"]/@src') # 3.根据图片地址, 循环拉取图片 offset = 0 for image_url in image_urls: image_content= requests.get(image_url).content # 依次将图片写入文件 with open('{}.jpg'.format(offset),'wb') as f: f.write(image_content) offset+=1