爬取国外疫情新闻

这是爬取国外的一个网站的疫情新闻,这个网站的数据很少,爬的时候要翻墙

https://apnews.com/hub/epidemics

from bs4 import BeautifulSoup
import re
import requests

#得到文章内容
def get_content(url):
    res = requests.get(url)
    res.encoding = 'utf-8'
    soup = BeautifulSoup(res.content, 'html.parser')
    newlist = soup.find_all('p')
    content=''
    for i in newlist:
        content+=i.text
    return content
data = []
for i in range(1):
    newsurl = 'https://apnews.com/hub/epidemics'
    res = requests.get(newsurl)
    soup = BeautifulSoup(res.text, 'html.parser')
    a = soup.find_all('a', class_="Component-headline-0-2-111")
    b = soup.find_all('span', class_="Timestamp Component-root-0-2-116 Component-timestamp-0-2-115")
    date = []
    for j in b:
        j = str(j)
        pattern1 = '(.*?) data-key="timestamp" data-source="(.*?)" title="(.*?)</span>'
        j =  re.findall(pattern1, j)
        for a1, a2, a3 in j:
            date.append(a2)
    title1 = []
    title2 = []
    lianjie= []
    for i in a:
        i = str(i)
        # <a class="Component-headline-0-2-111" data-key="card-headline" href="/article/asia-pacific-carrie-lam-coronavirus-pandemic-hong-kong-58660e5ecbd579169dcd9c7f244ed649"><h1 class="Component-h1-0-2-112">Hong Kong leader urges people to stay home as cases rise</h1></a>
        pattern = '<a class="Component-headline-0-2-111" data-key="card-headline" href="(.*?)"><h1 class="Component-h1-0-2-112">(.*?)</h1></a>'
        title = re.findall(pattern, i)
        for i, j in title:
            i = 'https://apnews.com/' + i
            title1.append(i)
            title2.append(j)
            lianjie.append(get_content(i))
        data = zip(date, title2, title1,lianjie)
        data=list(data)
    print(data)
    with open('国外美国AP数据.csv','w',encoding='utf_8_sig',newline="") as f:
     f.write('时间,标题,链接,内容\n')
     i=0
     while i<len(data):
        f.write(data[i][0]+","+data[i][1]+","+data[i][2]+','+data[i][3]+'\n')
        i+=1
f.close()
print("已保存文件")

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值