爬取国外名人名言的内容作者和标签（标签只选择第一个）

最新推荐文章于 2022-02-04 07:00:00 发布

玄学调参侠

最新推荐文章于 2022-02-04 07:00:00 发布

阅读量387

点赞数

分类专栏：笔记文章标签： python

本文链接：https://blog.csdn.net/weixin_45774059/article/details/107338844

版权

笔记专栏收录该内容

4 篇文章 0 订阅

订阅专栏

结果

代码

import requests
from lxml import etree
'''获取网页源代码'''
try:
    r = requests.get('http://quotes.toscrape.com/page/1/')
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    html = r.text
except:
    print('出现错误')
html1 = etree.HTML(html)
wenben = html1.xpath('//div[@class="quote"]/span[1]/text()')
zuoze = html1.xpath('//div/span[2]/small/text()')
#tags = html1.xpath('//div[@class="quote"]//div[@class="tags"]/a/text()')
tags=html1.xpath('//div[@class="quote"]//div[@class="tags"]/a[1]/text()')
#tags=html1.xpath('/html/body/div/div[2]/div[1]/div[1]/div/text()')
#print(wenben) 打印列表
#print(zuoze)  打印列表
#print(tags)   打印列表
if len(wenben)==len(zuoze):
    print('对的')
print(len(wenben))
ulist=[]
for i in range(len(wenben)):
    ulist.append([wenben[i],zuoze[i],tags[i]])
print(ulist)

print('{0:<130}\t{1:^20}{2:^8}'.format('名言','作者','标签'),chr(12288))
for i in ulist:
    print('{0:<130}\t{1:^20}{2:^8}'.format(i[0],i[1],i[2]),chr(12288))