python 爬取网上数据Crawler data(1.漫画)

冒雨前行的蜗牛

已于 2022-05-03 17:42:57 修改

阅读量1.3k

点赞数

分类专栏：笔记文章标签： python 前端爬虫

于 2022-04-29 23:31:55 首次发布

本文链接：https://blog.csdn.net/ZENGshuihai/article/details/124499195

版权

笔记专栏收录该内容

37 篇文章 1 订阅

订阅专栏

1.简单获取网页单张图片:
2.拿取网页代码:
在这里插入图片描述

import requests 

r = requests.get('https://img.wallpapersafari.com/desktop/1536/864/5/53/uyvkzZ.jpeg')
with open('图片.jpeg','wb') as f:
    f.write(r.content)
    f.close()

3.注意:有时获取的网页图片读不了时:
在这里插入图片描述
是因为网页反爬虫功能代码中需要添加:headers ={‘User-Agent’: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ’
‘Chrome/99.0.4844.84 Safari/537.36 OPR/85.0.4341.60’} #在网页检查代码内’控制台(console)'输入(window.navigator.userAgent)获取代码.

import requests

headers ={'User-Agent':  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                'Chrome/99.0.4844.84 Safari/537.36 OPR/85.0.4341.60'}
r = requests.get('https://get.wallhere.com/photo/space-galaxy-planet-fantasy-'
                 'wallpaper-desktop-landscape-surreal-1579551.png')
with open('图片.png', 'wb') as f:
    f.write(r.content)
    f.close()

获取图片: 在这里插入图片描述

4.获取整个网页图片:
在这里插入图片描述

#解析网页;
获取代码:

import requests ,re 
from bs4 import BeautifulSoup

def get_content(target):
#模拟电脑读取网页(对应网页反爬取手段):

    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                             'Chrome/99.0.4844.84 Safari/537.36 OPR/85.0.4341.60'}
                             
#获取网页元素:
    r = requests.get(url = target,headers = headers)
    
#解析网页元素(让元素认可以阅读)
    textwrap = BeautifulSoup(r.content, 'lxml')
    pictrues = textwrap.find('div', class_='hub-photomodal')
    pictrues = pictrues.find_all('a')
    for pictrue in pictrues:
    
#拿到要获取元素的储存位置:
        response = requests.get(pictrue.get('href'))
        return response
if __name__=='__main__':


    headers ={'User-Agent':  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                 'Chrome/99.0.4844.84 Safari/537.36 OPR/85.0.4341.60'}
    server = 'https://wallhere.com'
    r = requests.get(server)
    r.encoding = 'utf-8'
    text = BeautifulSoup(r.text,'lxml')
    picture_urls = text.find('div',class_ = 'hub-mediagrid hub-fleximages hub-loadphoto')
    picture_urls  = picture_urls .find_all('a')
    # print(picture_urls )
    for url in picture_urls:
        urls = url.get('href')
        # print(urls)
        url_img = re.findall('wallpaper',urls)
        # print(url_img)
        # print(type(urls))
        try:
            if url_img[0] == 'wallpaper':
                url = server + urls
                # print(url)
            else:
                continue
        except IndexError as e :
            continue
       response = get_content(url)
       picture_r = requests.get(response)
       with open('图片\%s' % (response.strip().split('-')[-1]), 'wb') as file:
            file.write(picture_r.content)
            file.close()

5.漫画获取下载:
检查网页元素:

在这里插入图片描述

检查图片反扒:view-source:https://www.dmzj.com/view/yaoshenji/41917.html(F12,右键查看元素);

   import requests  , os
   from bs4 import BeautifulSoup
   
#获取数据地址:
   url = 'http://www.txydd.com/chapter/10967/358966.html'
   download_header = {'Referer': 'http://www.txydd.com/chapter/10967/358966.html'}  #对应反爬取作用
   r = requests.get(url,headers =download_header)
   r.encoding = 'urt-8'
   bs = BeautifulSoup(r.text,'lxml')
   chapters = bs.find('ol',id = 'j_chapter_list')
   chapters = chapters.find_all('img')
comics_name = os.mkdir('虫师')
for chapter_url in chapters_url:
    chapter_content = requests.get(chapter_url).content
    with open(os.path.join(os.path.dirname + '%s' %(chapter_url.strip().split("_")[-1])),'wb') as f:
        f.write(chapter_content)
        f.close()

冒雨前行的蜗牛

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
python 爬取网上数据Crawler data(1.漫画)

1.简单获取网页单张图片:2.拿取网页代码:import requests r = requests.get('https://img.wallpapersafari.com/desktop/1536/864/5/53/uyvkzZ.jpeg')with open('图片.jpeg','wb') as f: f.write(r.content) f.close()3.注意:有时获取的网页图片读不了时:是因为网页反爬虫功能代码中需要添加:headers ={‘User-Ag
复制链接

扫一扫