1.首先我们要从网上用爬虫爬取古诗
这里我们用urllib库来进行爬虫,然后用re和beautifulsoup来进行解析
爬取的网站是https://so.gushiwen.org/mingju/
然后将爬取的古诗保存在C:\编程\python\poem.txt中
import codecs
from urllib.request import urlopen
import re
from bs4 import BeautifulSoup
url = 'https://so.gushiwen.org/mingju/'
html = urlopen(url).read().decode('utf-8')
soup = BeautifulSoup(html,features='lxml')
poem = re.findall(r'">(.*?)</a><span style=" color:#65645F',html)
f = codecs.open('C:\编程\python\poem.txt','ab')
f