【学习笔记】关于小白学习Python爬虫的一些笔记

最新推荐文章于 2024-09-14 19:55:48 发布

weixin_45849285

最新推荐文章于 2024-09-14 19:55:48 发布

阅读量98

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/weixin_45849285/article/details/104854949

版权

【学习笔记】关于小白学习Python爬虫的一些笔记

Requests与BeautifulSoup爬取一些网站图片的经验
这是第一次写的爬取网站的程序，写得不够简洁有些地方都写的不是很规范，希望在以后能够不断勉励写出更好的代码。也做作为自己以后学习的一个参考

import requests
from bs4 import BeautifulSoup
import warnings
import os
import lxml
warnings.filterwarnings('ignore')
os.makedirs('./abcd',exist_ok='True')
url='https://www.mzitu.com/'
def get_img(page):
    head={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36','Referer': 'https://www.mzitu.com/'}
    response=requests.get('https://www.mzitu.com/page/'+f'{page}',headers=head)
    response.encoding='UTF-8'    
    soup=BeautifulSoup(response.text,'lxml',fromEncoding='gb2312')
    print(soup.select('a'))
    img=soup.find_all('img')
    for imgu in img:
        gg=imgu.get('data-original')
        if gg is None:
            continue
        r=requests.get(gg,headers=head)
        imgname=str(gg).split('/')[-1]
        print(imgname)
        with open(f'./abcd/{imgname}','wb') as fd:
            for rr in r.iter_content(256):
                fd.write(rr)
for page in range(129,150):
    get_img(page)