使用BeautifulSoup解析
BeautifulSoup提供了从html中提取数据的功能
在cmd中输入pip install bs4
安装
使用BeautifulSoup获取日期
import requests
from bs4 import BeautifulSoup
link='https://blog.csdn.net/even160941'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'}
r=requests.get(link,headers=headers)
soup=BeautifulSoup(r.text,'lxml') ##lxml解析网页
date=soup.find('span',class_='date').text.strip() ##soup.find找到第一个日期
print('the date is',date)
用find_all来找所有博客日期并加入翻页功能:
soup.find_all不能直接用text.strip(),因为find_all返回一个列表
import requests
from bs4 import BeautifulSoup
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'}
for i in range(1,3):
link='https://blog.csdn.net/even160941'
r=requests.get(link,headers=headers,timeout=10)
soup=BeautifulSoup(r.text,'lxml')
date=soup.find_all('span',class_='date')
for x in date:
date=x.text.strip()
print('the date is',date)
注:本文参考https://blog.csdn.net/weixin_42183408/article/details/87459848