利用python爬虫结合前端技能实现经济学人（The Economist）阅时即查APP（010）

最新推荐文章于 2022-09-09 18:26:41 发布

「已注销」

最新推荐文章于 2022-09-09 18:26:41 发布

阅读量1.7k

点赞数 1

分类专栏： python javascript python爬虫资讯 web 文章标签：经济学人 python爬虫前端 economist

本文链接：https://blog.csdn.net/Lockey23/article/details/80143348

版权

本文介绍了如何利用Python爬虫抓取经济学人网站的最新文章列表，将其归档到本地，并详细阐述了爬取过程、数据结构以及归档目录和保存的JSON文件结构。下篇将探讨文章中单词的去重方法。

摘要由CSDN通过智能技术生成

010、python爬取经济学人最新列表文章，归档为本地文件

首先回顾一下获取首页最新文章列表[[a,title],…]：

def getPaperList():
    url = 'https://economist.com'
    req = urllib.request.Request(url=url,headers=headers, method='GET')
    response = urllib.request.urlopen(req)
    html = response.read()
    selector = etree.HTML(html.decode('utf-8'))
    goodpath='/html/body/div[1]/div[1]/div[1]/div[2]/div[1]/main[1]/div[1]/div[1]/div[1]/div[3]/ul[1]/li'
    art=selector.xpath(goodpath)
    awithtext = []
    try:
        for li in art:
            ap = li.xpath('article[1]/a[1]/div[1]/h3[1]/text()')
            a = li.xpath('article[1]/a[1]/@href')
            awithtext.append([a[0],ap[0]])
    except Exception as err:
        print(err,'getMain')
    finally:
        return awithtext

1、接着分析要爬取的文章的html结构

最低0.47元/天解锁文章

「已注销」

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
利用python爬虫结合前端技能实现经济学人（The Economist）阅时即查APP（010）

python爬取经济学人最新列表文章，归档为本地文件；利用python爬虫结合前端技能实现经济学人（The Economist）阅时即查APP;经济学人
复制链接

扫一扫

专栏目录