python爬虫实战（1）——爬取知乎热门回答图片

最新推荐文章于 2022-10-03 09:34:24 发布

皮小孩ls

最新推荐文章于 2022-10-03 09:34:24 发布

阅读量3k

点赞数 9

分类专栏： python爬虫入门文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_44809707/article/details/110956780

版权

python爬虫入门专栏收录该内容

2 篇文章 143 订阅

订阅专栏

文章目录

一、前期准备
- 1.查看网页源代码
- 2.看图片在什么位置
二、python代码实现
三、最终结果

一、前期准备

1.查看网页源代码

打开问题链接平常人可以漂亮到什么程度

在这里插入图片描述
按F12打开开发者工具。

2.看图片在什么位置

用箭头指向其中一张图片，查看图片位置。
在这里插入图片描述
可以观察到每张图片都在一个figure里面。里面有img标签，有图片链接地址，我们可以全部提取出来，并保存在本地。

二、python代码实现

1.解析网页

def get_soup(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    soup = BeautifulSoup(html, 'html.parser')
    return soup

2.获取问题标题

def get_question_title(soup):
    title=soup.find_all('h1', class_='QuestionHeader-title')[0].text
    print('问题：{}'.format(title))

在这里插入图片描述

3.获取回答者信息

def get_hot_answer(soup):
    auther=soup.find_all('div', class_='AuthorInfo-content')[0]
    autherinfo=auther.find_all('a')[0].text
    print('回答者：{}'.format(autherinfo))

在这里插入图片描述

4.图片保存到本地

def get_img(soup):
    imglist=[]
    for item in soup.find_all('div', class_='QuestionAnswer-content'):
        figure=item.find_all('figure')
    for t0 in figure:
        t1=t0.find_all('img')
        for t2 in t1:
            t3=t2.get('src')
            imglist.append(t3)
  # 表示在整个网页中过滤出所有图片的地址，放在imglist中
    path = 'image'
    print(imglist)
    paths=''
  # 将图片保存到image文件夹中，如果没有image文件夹则创建
    if not os.path.isdir(path):
        os.makedirs(path)
        paths = path + '\\'  # 保存在image路径下
    else:
        paths='image'+'\\'
    idx = 1
    for imgurl in imglist:
        a = imgurl.startswith('http')
        if (a):
            urllib.request.urlretrieve(imgurl,'{0}{1}.png'.format(paths,idx))  # 打开imglist中保存的图片网址，并下载图片保存在本地，format格式化字符串
            idx = idx + 1

5.完整代码

import urllib.request
import urllib
import os
import requests
from bs4 import BeautifulSoup

url = 'https://www.zhihu.com/question/50426133/answer/483139994'
def get_soup(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    soup = BeautifulSoup(html, 'html.parser')
    return soup
def get_question_title(soup):
    title=soup.find_all('h1', class_='QuestionHeader-title')[0].text
    print('问题：{}'.format(title))

def get_hot_answer(soup):
    auther=soup.find_all('div', class_='AuthorInfo-content')[0]
    autherinfo=auther.find_all('a')[0].text
    print('回答者：{}'.format(autherinfo))
def get_img(soup):
    imglist=[]
    for item in soup.find_all('div', class_='QuestionAnswer-content'):
        figure=item.find_all('figure')
    for t0 in figure:
        t1=t0.find_all('img')
        for t2 in t1:
            t3=t2.get('src')
            imglist.append(t3)
  # 表示在整个网页中过滤出所有图片的地址，放在imglist中
    path = 'image'
    print(imglist)
    paths=''
  # 将图片保存到image文件夹中，如果没有image文件夹则创建
    if not os.path.isdir(path):
        os.makedirs(path)
        paths = path + '\\'  # 保存在image路径下
    else:
        paths='image'+'\\'
    idx = 1
    for imgurl in imglist:
        a = imgurl.startswith('http')
        if (a):
            urllib.request.urlretrieve(imgurl,'{0}{1}.png'.format(paths,idx))  # 打开imglist中保存的图片网址，并下载图片保存在本地，format格式化字符串
            idx = idx + 1
if __name__ == '__main__':
    soup=get_soup(url)
    get_question_title(soup)
    get_hot_answer(soup)
    get_img(soup)

三、最终结果

在这里插入图片描述

放几张大图吧！

皮小孩ls

关注

9
点赞
踩
40

收藏

觉得还不错? 一键收藏
打赏
2
评论
python爬虫实战（1）——爬取知乎热门回答图片

文章目录一、前期准备1.查看网页源代码2.看图片在什么位置二、python代码实现1.解析网页2.获取问题标题3.获取回答者信息4.图片保存到本地5.完整代码三、最终结果一、前期准备1.查看网页源代码打开问题链接平常人可以漂亮到什么程度按F12打开开发者工具。2.看图片在什么位置用箭头指向其中一张图片，查看图片位置。可以观察到每张图片都在一个figure里面。里面有img标签，有图片链接地址，我们可以全部提取出来，并保存在本地。二、python代码实现1.解析网页def get_s
复制链接

扫一扫