人生苦短,我用Python-----爬取图片

        人生苦短,我学python!

        最近准备看看机会,看了好多的jd上,都要求会一点python,shell脚本,就在空闲的时间里面学习了一下,刚刚入门,还是一个小菜鸡,不过能写一两个小爬虫了,嘿嘿嘿

        在这里给大家推荐一下我自学的网站,讲的很简单,https://www.liaoxuefeng.com/wiki/1016959663602400,那就是廖雪峰大佬的博客,好东西就是分享.我的第一语言是java,学了这点python之后,我是真觉的 人生苦短,我用python! 说的是真对.

        程序员大多都是很懒,python 会让你变得更懒,好多东西都已经封装好了,因一个包就能直接用,so easy!这篇文章先来分享一个我自己写的一个爬取图片的小程序,写的很烂,命名方面和java差很多,高抬贵嘴,莫喷

       

# -*- coding: UTF-8 -*-
import requests, os, time, random
from bs4 import BeautifulSoup
from urllib.request import urlretrieve

"""
    爬取图片网站的demo http://www.shuaia.net/
"""

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
}
params = {"tagname": "美女"}


def get_pageurl(j, target_urls):
    url = r"http://www.shuaia.net/e/tags/index.php?page=%d&line=25&tempid=3" % (j)
    response = requests.get(url=url, headers=headers, params=params)
    if response.status_code != 200:
        return None
    print(response.url)
    response.encoding = 'utf-8'
    soup = BeautifulSoup(response.text, 'lxml')
    find_all = soup.find_all(class_='item-img')
    for item in find_all:
        target_urls.append(item.img.get('alt') + '=' + item.get('href'))
    return target_urls


if __name__ == '__main__':

    while True:
        j = 0
        target_urls = []
        target_urls = get_pageurl(j, target_urls)
        if None == target_urls:
            continue
        print(target_urls)

        j = j + 1
        for item in target_urls:
            detail = item.split("=")
            fileName = detail[0]
            print(fileName)
            file_name = fileName + ".jpg"
            if fileName not in os.listdir():
                os.makedirs(fileName)
            fileUrl = detail[1]
            print("下载 -》》》》" + fileName)
            response_img = requests.get(fileUrl)
            response_img.encoding = 'utf-8'
            html = response_img.text
            img_html = BeautifulSoup(html, 'lxml')
            html_find = img_html.find_all('div', class_='wr-single-content-list')
            img_bf_2 = BeautifulSoup(str(html_find), 'lxml')
            img_url = 'http://www.shuaia.net' + img_bf_2.div.img.get('src')
            urlretrieve(url=img_url, filename=fileName + '/' + file_name)
            print(img_url)
            url_end = ''
            time.sleep(random.randint(0, 5))
            fileUrl = fileUrl[0:len(fileUrl) - 5]
            i = 1
            while True:
                url_end = '_' + str(i + 1) + '.html'
                crl_file_url = fileUrl + url_end
                crl_response_img = requests.get(crl_file_url)
                if crl_response_img.status_code != 200:
                    break
                crl_response_img.encoding = 'utf-8'
                crl_html = crl_response_img.text
                crl_img_html = BeautifulSoup(crl_html, 'lxml')
                crl_html_find_1 = crl_img_html.find_all('div', class_='wr-single-content-list')
                crl_img_bf_2_1 = BeautifulSoup(str(crl_html_find_1), 'lxml')
                crl_img_url = 'http://www.shuaia.net' + crl_img_bf_2_1.div.img.get('src')
                urlretrieve(url=crl_img_url, filename=fileName + '/' + fileName + str(i + 1) + ".jpg")
                i = i + 1
                time.sleep(random.randint(0, 5))

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值