import requests
from bs4 import BeautifulSoup
r = requests.get('https://python123.io/ws/demo.html')
demo = r.text
# 提供demo和解析器
soup = BeautifulSoup(demo, 'html.parser')
print(soup.prettify())
print(soup.a)
print(soup.a.name)
print(soup.a.attrs)
print(soup.a.string)
1.html文档=标签树(肩括号标签对)=BeautifulSoup类,可以把BeautifulSoup类对应一个HTML/XML文档的全部内容
2.解析器
3.BS类的五个基本元素
print(soup.a)
>>> <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
print(soup.a.name)
>>> a
print(soup.a.attrs)
>>> {'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}
print(soup.a.string)
>>> Basic Python
print(type(soup.a))
>>> <class 'bs4.element.Tag'>
print(type(soup.a.name))
>>> <class 'str'>
print(type(soup.a.attrs))
>>> <class 'dict'>
print(type(soup.a.string))
>>> <class 'bs4.element.NavigableString'>
soup是解析后的整个标签树
①soup.a 表示标签树中的<a …>…这一标签对,任何存在于html语法的标签都可以用
②soup.tag来表示(tag可以是a,p,b,html)
③soup.a.name指 标签名字
④soup.a.attrs指 **标签属性,**无论是否有属性,都返回字典类型
⑤soup.a.string获取 标签之间的字符串,如果存在注释,会返回注释类型