为什么使用Beautiful Soup
简单来说,Beautiful Soup是python的一个库,最主要的功能是从网页抓取数据。官方解释如下:
在爬虫的实现中,经常使用正则表达式来匹配要查找的部分,但是如果一个正则匹配稍有差池,那可能程序就处在永久的循环之中,可以使用一个更强大的工具,叫Beautiful Soup,有了它我们可以很方便地提取出HTML或XML标签中的内容。
Beautiful Soup提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱,通过解析文档为用户提供需要抓取的数据,因为简单,所以不需要多少代码就可以写出一个完整的应用程序。
Beautiful Soup自动将输入文档转换为Unicode编码,输出文档转换为utf-8编码。你不需要考虑编码方式,除非文档没有指定一个编码方式,这时,Beautiful Soup就不能自动识别编码方式了。然后,你仅仅需要说明一下原始编码方式就可以了。
Beautiful Soup已成为和lxml、html6lib一样出色的python解释器,为用户灵活地提供不同的解析策略或强劲的速度。
安装过程出现了链接中类似的问题,参照作者的方案,
首先输入 anaconda search -t conda beautifulsoup4,这样子就会显示可用的版本 ,我的显示效果如下所示:
(wangli) D:\Anaconda3\envs\wangli> anaconda search -t conda beautifulsoup4
Using Anaconda API: https://api.anaconda.org
Packages:
Name | Version | Package Types | Platforms | Builds
------------------------- | ------ | --------------- | --------------- | ----------
IzODA/beautifulsoup4 | 4.6.0 | conda | zos-z | py37_0, py36_0
NOAA-ORR-ERD/beautifulsoup4 | 4.3.2 | conda | win-64, osx-64 | 0, py34_0, py33_0, py27_0
: Screen-scraping library
ODSP-TEST/beautifulsoup4 | 4.6.0 | conda | zos-z | py37_0, py36_0
RahulJain/beautifulsoup4 | 4.4.1 | conda | win-64 | py27_0
Trentonoliphant/beautifulsoup4 | 4.3.2 | conda | win-32, win-64 | py34_0, py33_0, py26_0, py27_0
: http://www.crummy.com/software/BeautifulSoup/bs4/
aarch64_gbox/beautifulsoup4 | 4.5.3 | conda | linux-aarch64 | py36_0
aetrial/beautifulsoup4 | | conda | linux-64, osx-64 | py35_0, py27_0
akode/beautifulsoup4 | 4.3.2 | conda | osx-64 | py27_0
: Screen-scraping library
alefnula/beautifulsoup4 | 4.1.3 | conda | osx-64 | py34_0
: UNKNOWN
anaconda/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, osx-64, linux-32, win-64 | py37_0, py37_1, py36h6ea3382_0, py36_1, py36_0, py35hb75f182_1, py27h3f86ba9_1, py27_1, py27_0, py27hc287451_1, py27h9416283_1, py27hdc1f29e_0, py35h61fcdcc_1, py36h49b8c8c_1, py38_0, py36h4361f19_1, py27h8bb5803_1, py35h94b83b4_1, py35h50ea147_0, py34_0, py35_0, py36h72d3c9f_1, py36hd4cc5e8_1, py35h442a8c9_1, py35_1
: Python library designed for screen-scraping
anacondams/beautifulsoup4 | 4.5.1 | conda | linux-64, win-64 | py35_0
: Python library designed for screen-scraping
archiarm/beautifulsoup4 | 4.7.0 | conda | linux-aarch64 | py27_1000, py36_1000, py37_1000
: Python library designed for screen-scraping
asmeurer/beautifulsoup4 | 4.2.1 | conda | osx-64 | py26_1, py33_0, py33_1, py27_1, py27_0
: http://www.crummy.com/software/BeautifulSoup/bs4/
auto/beautifulsoup4 | 4.3.2 | conda | linux-64, linux-32, osx-64 | py27_0
: Screen-scraping library
c4aarch64/beautifulsoup4 | 4.6.3 | conda | linux-aarch64 | py37_0
: Python library designed for screen-scraping
c4armv7l/beautifulsoup4 | 4.7.1 | conda | linux-armv7l | py37_1001
: Python library designed for screen-scraping
cdat-forge/beautifulsoup4 | 4.8.1 | conda | linux-64, osx-64 | py27_0
: Python library designed for screen-scraping
conda-forge/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, linux-aarch64, osx-64, win-64 | py37_0, py36_1001, py36_1000, py37_1000, py37_1001, py27_0, py36_0, py38_0, py36h9f0ad1d_1, py37hc8dfbb8_1, py36hc560c46_1, py27_1001, py27_1000, py38h32f6830_1, py35_0
: Python library designed for screen-scraping
conner_org/beautifulsoup4 | 4.7.1 | conda | linux-64 | py27_1
: Python library designed for screen-scraping
daf/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0
: Screen-scraping library
draikes/beautifulsoup4 | 4.4.1 | conda | win-64 | py27_0
: UNKNOWN
ericmjl/beautifulsoup4 | 4.4.0 | conda | linux-64, osx-64 | py34_0
: Screen-scraping library
free/beautifulsoup4 | 4.6.0 | conda | linux-ppc64le, linux-64, win-32, osx-64, linux-32, win-64 | py36_0, py34_0, py35_0, py27_0
: Python library designed for screen-scraping
iilab/beautifulsoup4 | 4.3.2 | conda | linux-64, osx-64 | py34_0
: Screen-scraping library
ijstokes/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0
jetson-tx2/beautifulsoup4 | 4.6.0 | conda | noarch | py_0
: Python library designed for screen-scraping
jjhelmus/beautifulsoup4 | | conda | linux-aarch64 | py37_0
: Python library designed for screen-scraping
jmatsushita/beautifulsoup4 | 4.3.2 | conda | linux-64 | py34_0
: Screen-scraping library
josh/beautifulsoup4 | 4.3.2 | conda | win-64 | py34_0
: Screen-scraping library
main/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, osx-64, linux-32, win-64 | py37_0, py37_1, py36h6ea3382_0, py36_1, py27hdc1f29e_0, py35hb75f182_1, py27h3f86ba9_1, py27_1, py27_0, py27hc287451_1, py27h9416283_1, py36_0, py35h61fcdcc_1, py36h49b8c8c_1, py38_0, py36h4361f19_1, py27h8bb5803_1, py35h94b83b4_1, py35h50ea147_0, py35_0, py36h72d3c9f_1, py36hd4cc5e8_1, py35h442a8c9_1, py35_1
: Python library designed for screen-scraping
manmadescience/beautifulsoup4 | 4.3.2 | conda | win-64 | py34_0
: Screen-scraping library
moghimis/beautifulsoup4 | 4.3.2 | conda | linux-32 | py27_0
: Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree
moustik/beautifulsoup4 | 4.5.0 | conda | linux-64 | py27_0
ngould/beautifulsoup4 | 4.2.1 | conda | osx-64 | py27_0
: http://www.crummy.com/software/BeautifulSoup/bs4/
pdrops/beautifulsoup4 | 4.3.2 | conda | osx-64 | py27_0
: Screen-scraping library
prkrekel/beautifulsoup4 | 4.3.2 | conda | win-64 | py27_0
: Screen-scraping library
prometeia/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, linux-aarch64, win-64, osx-64 | py37_0, py36_0, py38_0, py27_0
: Python library designed for screen-scraping
rmcgibbo/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0
: Screen-scraping library
rodgomesc/pip-beautifulsoup4 | 4.5.3 | conda | noarch | 0
: Screen-scraping library Built for Android and iOS apps using enaml-native.
rogerramos/beautifulsoup4 | 4.6.0 | conda | linux-64 | py27_3
: Screen-scraping library
rpi/beautifulsoup4 | 4.6.3 | conda | linux-armv6l, linux-armv7l, noarch | py27_1, py27_0, py36_1, py36_0, py_0, py35_0, py35_1
: Python library designed for screen-scraping
rpi64/beautifulsoup4 | 4.6.3 | conda | linux-aarch64 | py36_0
: Python library designed for screen-scraping
rsmulktis/beautifulsoup4 | 4.5.3 | conda | linux-armv7l | py34_0
: Screen-scraping library
sayth/beautifulsoup4 | 4.4.0 | conda | win-64 | py34_3
: This is a simple meta-package
sundarv/beautifulsoup4 | 4.5.3 | conda | win-64 | py36_0
: Python library designed for screen-scraping
sunpy/beautifulsoup4 | 4.8.2 | conda | linux-ppc64le, linux-64, win-32, linux-aarch64, osx-64, win-64 | py37_0, py36_1001, py36_1000, py37_1000, py37_1001, py27_0, py36_0, py38_0, py36h9f0ad1d_1, py37hc8dfbb8_1, py36hc560c46_1, py27_1001, py27_1000, py38h32f6830_1, py35_0
: Python library designed for screen-scraping
syllabs_admin/beautifulsoup4 | 4.6.0 | conda | linux-64 | py27h490011d_0
tbalaburkina/beautifulsoup4 | 4.6.0 | conda | zos-z | py37_0
test_org_002/beautifulsoup4 | 4.5.3 | conda | [] | py36_0, py27_0, py35_0, py34_0
travis/beautifulsoup4 | 4.3.2 | conda | linux-64, osx-64 | py27_0
: Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
ulmo/beautifulsoup4 | 4.3.2 | conda | linux-64, win-32, osx-64, linux-32, win-64 | py27_0
: Screen-scraping library
wakari1/beautifulsoup4 | 4.3.2 | conda | linux-64 | py27_0
ziebel/beautifulsoup4 | 4.4.0 | conda | linux-64 | py34_0
: Screen-scraping library
zoeith/beautifulsoup4 | 4.3.2 | conda | osx-64 | py27_0
: Screen-scraping library
Found 54 packages
Run 'anaconda show ' to get installation details
我选择的版本是conda-forge/beautifulsoup4,在命令行中输入:
conda install -c https://conda.anaconda.org/conda-forge beautifulsoup4, 注意conda-forge和beautifulsoup4之间没有“/”。