作者:IT小样
本篇主要介绍对BeautifulSoup的引用,以之前教程中的HTML为例:
html_doc = '''
<html><head><title>hello,tester</title></head><body>
<p class="title"><b><h1>Hello,welcome</h1></b></p>
<p class="documentation">Tester, welcome! This is a new partion of your job's life. With python, you can finnish your work easier and faster.How, <a href="http://example.com/easier" class="easier" id="link1"> easier </a> and <a href="http://example.com/faster" class="faster" id = "link2">faster</a> Now, you have a initial impression about python.</p>
<p class="documention">let's go!!!</p>
</body></html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)
从上面的html_doc的定义来看,tag中包含子节点,那如何操作与遍历呢?
1、操作文档
1.1、获取元素
通过tag的name来获取元素值
>>>tag.head
<head><title>hello,tester</title></head>
>>>tag.body.p
<p class=“title”><b><h1>Hello,welc