基本介绍
基本使用
简单案例
test.html 中的代码
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<h1>标题1</h1>
<h2>标题2</h2>
<h3>标题3</h3>
<h4>标题4</h4>
<div id="content" class="default">
<p>段落</p>
<a href="http://www.baidu.com">百度</a> <br/>
<a href="http://www.iqiyi.com">爱奇艺</a> <br/>
<img src="https://www.python.org/static/img/python-logo.png" />
</div>
</body>
</html>
test.py 中的代码
from bs4 import BeautifulSoup
with open('./test.html', encoding='utf-8') as f:
html_doc = f.read()
soup = BeautifulSoup(html_doc, 'html.parser')
div_node = soup.find('div', id='content')
print(div_node)
print('='*20)
links = div_node.find_all('a')
for link in links:
print(link.name, link['href'], link.get_text())
img = div_node.find('img')
print(img['src'])
代码运行结果