第一只爬虫（Requests和BeautifulSoup）第二版

最新推荐文章于 2024-05-06 19:47:05 发布

王真人

最新推荐文章于 2024-05-06 19:47:05 发布

阅读量212

点赞数

分类专栏： python 基础

本文链接：https://blog.csdn.net/u011388209/article/details/104832903

版权

本文介绍了如何使用Requests和BeautifulSoup进行网页爬虫的改进，包括代码重构以提高可读性，新增功能——按文件夹保存并以标题命名图片，以及扩展爬取范围，现能爬取7个不同板块的内容。后续计划继续完善，增加对第八个板块的爬取支持。

摘要由CSDN通过智能技术生成

1、用方法重写了代码，读起来更清晰些。
2、增加了按文件夹保存，并按标题命名图片
2、比第一版扩大了爬取范围，第一版只能爬取第一个板块，这一版可以爬七个板块，第八个板块等下次更新再添加进去（这个板块和其他板块有少许区别需加个判断）。

import requests
from bs4 import BeautifulSoup
import re
import os

m=[1] #从1开始的单人图片序号
o=1 #从1开始的人的序号

def SoupUrl(self):  #解析网页
    response=requests.get(self,headers={
   'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 '
           '(KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'})
    response.raise_for_status()
    response.encoding = response.apparent_encoding
    soup=BeautifulSoup(response.text,'html.parser')
    return soup

def ZhuantiLst(self):  #得到专题列表
    n=1
    ztlst=[] #专题列表
    soup=SoupUrl(self).find(name='div',attrs={
   "id":"container"})
    soup=soup.find_all(name='h3',attrs={
   'class':"list_title"})
    for x in soup:
        ztlst.append(x.find('a').get('href'))
        print("\r已得到的第{0}个专题链接".format(n)