python考试的时候一道简单的爬虫题记录:
给一个URL后,获取html内容并提取p标签中class="title"的文本内容
import requests
from bs4 import BeautifulSoup
html = """
<html><head><title>TheDormouse'sstory</title></head>
<body>
<p class="title"name="dromouse"><b>TheDormouse'sstory</b></p>
<p class="story">Onceuponatimetherewerethreelittlesisters;andtheirnameswere
<a href="http://example.com/elsie"class="sister"id="link1"><!--Elsie--></a>,
<a href="http://example.com/lacie"class="sister"id="link2">Lacie</a>and
<a href="http://example.com/tillie"class="sister"id="link3">Tillie</a>;
andtheylivedatthebottomofawell.</p>
<pclass="story">...</p>"""
bsObj = BeautifulSoup(html, "html.parser")
print(bsObj.title)
print(bsObj.title.string