小试牛刀,剪短的代码爬取李白200首诗歌,并整齐的保存到txt文档中,代码如下:
import requests
from lxml import etree
n = 0
for i in range(10):
if i==0:
url="http://www.shicimingju.com/chaxun/zuozhe/1.html"
else:
url=="http://www.shicimingju.com/chaxun/zuozhe/"+"1_"+str(i+1)+".html"
html=requests.get(url)
r=etree.HTML(html.text)
for row in r.xpath("//div[@class='shici_list_main']"):
n+=1
title=row.xpath("h3/a/text()")[0]
if row.xpath("div/div/text()"):
content="\n".join(row.xpath("div/text()")).replace(' ', '').rstrip()+"\n".join(row.xpath("div/div/text()")).replace(' ', '')
else:
content = "\n".join(row.xpath("div/text()")).replace(' ', '')
with open("李白的诗.txt","a") as f:
f.write("【{}】{}{}\n\n".format(n,title,content))
print("\r当前进度: {:.2f}%".format(n * 10 / len( r.xpath("//div[@class='shici_list_main']"))), end="")
取结果部分截图如下: