用lxml和xpath爬取李白诗词

最新推荐文章于 2023-03-06 18:06:32 发布

weixin_41534322

最新推荐文章于 2023-03-06 18:06:32 发布

阅读量797

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/weixin_41534322/article/details/104968891

版权

爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

小试牛刀，剪短的代码爬取李白200首诗歌，并整齐的保存到txt文档中，代码如下：

import requests
from lxml import etree
n = 0
for i in range(10):
    if i==0:
        url="http://www.shicimingju.com/chaxun/zuozhe/1.html"
    else:
        url=="http://www.shicimingju.com/chaxun/zuozhe/"+"1_"+str(i+1)+".html"

    html=requests.get(url)
    r=etree.HTML(html.text)
    for row in r.xpath("//div[@class='shici_list_main']"):
        n+=1
        title=row.xpath("h3/a/text()")[0]
        if row.xpath("div/div/text()"):
            content="\n".join(row.xpath("div/text()")).replace('  ', '').rstrip()+"\n".join(row.xpath("div/div/text()")).replace('  ', '')
        else:
            content = "\n".join(row.xpath("div/text()")).replace('  ', '')
        with open("李白的诗.txt","a") as f:
            f.write("【{}】{}{}\n\n".format(n,title,content))
        print("\r当前进度: {:.2f}%".format(n * 10 / len( r.xpath("//div[@class='shici_list_main']"))), end="")