BeautifulSoup 库的安装
安装beautiful soup 库可以直接使用命令 pip install beautifulsoup4,安装完成之后可以通过演示 HTML 页面地址:http://python123.io/ws/demo.html 进行测试。我们打开这个网址,查询源代码,得到下面的结果:
<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
</body></html>
BeautifulSoup 库详解
BeautifulSoup 类的基本元素列举如下:
我们可以对前面获得的 soup 进行 尝试:
print(soup.title)
# 返回 <title>This is a python demo page</title>
tag = soup.a
print(tag)
# 返回 <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
print(soup.a.parent.name) # 返回 'p'
print(soup.a.parent.parent.name) # 返回 'body'
print(tag.attrs)
# 返回 {'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}
prin(soup.a.string) # 返回 Basic Python