Python 爬虫-豆瓣影星图片下载

该代码实现了一个功能,用户输入明星姓名,通过Selenium和BeautifulSoup解析网页,获取明星在豆瓣上的ID,然后根据ID下载指定数量的明星照片。程序首先搜索明星,找到其在豆瓣的唯一标识,接着通过这个ID获取照片页面,解析HTML获取图片链接,并下载到本地的'photos'目录下。
摘要由CSDN通过智能技术生成

该功能实现了通过输入姓名下载对应的图片

import os
import re

from selenium import webdriver
import requests
import urllib.request
from bs4 import BeautifulSoup

def getIDBeyound(name):
    url = "https://search.douban.com/movie/subject_search?search_text=%s&cat=1002"
    url = url % name
    driver = webdriver.Chrome("chromeD.exe")
    driver.get(url)
    an = driver.page_source
    driver.close()
    driver.quit()
    idm = re.findall('<a href="https://movie.douban.com/celebrity/(.*?)/',an)
    try:
        return idm[0]
    except:
        print("搜索不存在")
        exit("吔")
        return

def getContent(idm,pag):
    hd = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36 SLBrowser/7.0.0.4071 SLBChan/21"}
    url = "https://movie.douban.com/celebrity/%s/photos/?type=C&start=%s&sortby=like&size=a&subtype=a"
    url = url%(idm,pag)
    print(url)
    page = requests.get(url,headers=hd).content.decode("utf-8")
    soup = BeautifulSoup(page, 'html.parser')
    return soup

def getItem(soup):
    code = []
    photoList = soup.find('ul', attrs={'class':"poster-col3 clearfix"})
    for obj in photoList.find_all('li'):
        src = obj.img['src']
        code.append(src)
    return code

def downloads(index,url):
    dir = "photos"
    if os.path.isdir("photos") == False:
        os.mkdir("photos")
    ext = url.split(".")[-1]
    opener = urllib.request.build_opener()
    opener.addheaders = [('user-agent','Mozilla/5.0')]
    path = dir + "/" +str(index) + "." + ext
    urllib.request.install_opener(opener)
    urllib.request.urlretrieve(url,path)

name = input("请输入搜索姓名")
idm = getIDBeyound(name)

codes = []
a = input("请输入下载张数")
for pag in range(0,a,30):
    print("start="+int(pag))
    soup = getContent(idm,pag)
    code = getItem(soup)
    codes.extend(code)
print(codes)
for i,inV in enumerate(codes):
    print(i)
    downloads(i,inV)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值