具体讲解见书第二章
import requests
url="http://www.cntour.cn"
strhtml=requests.get(url)
print(strhtml.text)
import bs4
from bs4 import BeautifulSoup
soup=BeautifulSoup(strhtml.text,"lxml")
data=soup.select("#main > div > div.mtop.firstMod.clearfix > div.centerBox > ul.newsList > li > a")
for item in data:
result = {
"title": item.get_text(),
"link": item.get("href")
}
print(result)
结果如下:
{‘title’: ‘让城市文脉融入现代生活’, ‘link’: ‘http://www.cntour.cn/news/6546/’}
{‘title’: ‘新时代中俄旅游合作呼唤新作为’, ‘link’: ‘http://www.cntour.cn/news/6540/’}
{‘title’: ‘高质量标准引领景区优质旅游新时代’, ‘link’: ‘http://www.cntour.cn/news/6535/’}
{‘title’: ‘数字文旅时代来了’, ‘link’: ‘http://www.cntour.cn/news/6530/’}
{‘title’: ‘[文明之光照亮复兴之路]’, ‘link’: ‘http://www.cntour.cn/news/6541/’}
{‘title’: ‘[游遍世园会:一条龙服务]’, ‘link’: ‘http://www.cntour.cn/news/6522/’}
{‘title’: ‘[高端旅游如何发力?]’, ‘link’: ‘http://www.cntour.cn/news/6512/’}