Python爬虫（bs4）-1

最新推荐文章于 2023-09-27 17:41:19 发布

夜影成辉

最新推荐文章于 2023-09-27 17:41:19 发布

阅读量325

点赞数

分类专栏： Python 文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_19589789/article/details/52397324

版权

Python 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

本文基于书籍《Python网络数据采集》（Web Scraping with python）
语言：python3
系统：CentOS7
python库：Beautiful Soup4

安装BeautifulSoup4

登陆官网
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#id5
具体的安装方式

1.使用库BeautifulSoup4

创建BeautifulSoup对象

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = urlopen("你的URL")
bsObj = BeautifulSoup(url)

bsObj 为BeautifulSoup对象
将URL指向HTML/XML文本标签转换成树形层次结构，方便提取

bsObj.标签名

直接调用标签内容

2.异常

URL异常：
无此服务器 ————urlopen返回None
有服务器但服务器无此内容————404 （一个HTTPError错误）
标签异常：
无此标签—————返回None
在无此标签情况下搜索其子标签————AttributeError

异常检测模板：

from urllib.request import urlopen
from urllib.error import HTTPError
def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        return None
    try:
        bsobj = BeautifulSoup(html.read())
        titel = bsobj.boby.h1
    except ArrtibuteError as e:
        return None
title = getTitle("")
if title == None:
    print("Title could not found !")
else:
    print(title)

夜影成辉

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫（bs4）-1

本文基于书籍《Python网络数据采集》（Web Scraping with python）语言：python3 系统：CentOS7 python库：Beautiful Soup4安装BeautifulSoup4登陆官网 https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#id5 具体的安装方式1.使用库Beautiful
复制链接

扫一扫

专栏目录