Linux环境
1. 安装
方法一:
解压:tar -xzvf beautifulsoup4-4.2.0.tar.gz
安装:进入解压后的目录
python setup.py build
sudo python setup.py install
方法二(快速安装)
(Ubuntu) sudo apt-getinstallpython-bs4或者
install beautifulsoup4
或着
easy_install beautifulsoup4
2. 引用(python环境下)
from bs4 import BeautifulSoup
3. 使用
案例
html_doc = """
The Dormouse's storyThe Dormouse's story
Once upon a time there were three little sisters; and their names were
Lacie and
and they lived at the bottom of a well.
...
"""
开始
from bs4 importBeautifulSoup
soup= BeautifulSoup(html_doc)
>>>soup.head()
[
The Dormouse's story]>>>soup.title
The Dormouse's story>>>soup.title.string
u"The Dormouse's story"
>>>soup.body.bThe Dormouse's story
>>>soup.body.b.string
u"The Dormouse's story"
>>>soup.aElsie
找到所有的a
soup.find_all('a')
打印每个a中的信息
>>> for key in soup.find_all('a'):
...print key.get('class'), key.get("href")
...
['sister'] http://example.com/elsie
['sister'] http://example.com/lacie
['sister'] http://example.com/tillie
参考