静态网页爬取数据记录

最新推荐文章于 2023-12-22 02:49:36 发布

VIP文章 Andrew_SJ

最新推荐文章于 2023-12-22 02:49:36 发布

阅读量198

点赞数

文章标签： python 爬虫

本文链接：https://blog.csdn.net/Andrew_SJ/article/details/110389961

版权

# 爬取2000-2016年Hursat-b1数据

import requests
import os
from urllib import request
from bs4 import BeautifulSoup as bs

def mkdir(path):
    path = path.strip()
    isExists = os.path.exists(path)
    if not isExists:
        print('创建文件夹：', path)
        os.makedirs(path)
        print('创建成功！')
    else:
        print(path, '文件夹已存在。')

url = 'https://www.ncei.noaa.gov/data/hurricane-satellite-hursat-b1/archive/v06/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36 Edg/87.0.664.47'}
for i in range(2000, 2017):
    url_y = url + str(i)
    str_html = requests.get(url_y, headers=headers)
    all_a = bs(str_html.text, 'lxml').find_all('a')
    use_a = all_a[5:]
    folder = 'E:/hurricane-satellite-hursat-b1-v06' + '/' + str(i)
    
    mkdir(folde

最低0.47元/天解锁文章

优惠劵

Andrew_SJ

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
静态网页爬取数据记录

# 爬取2000-2016年Hursat-b1数据import requestsimport osfrom urllib import requestfrom bs4 import BeautifulSoup as bsdef mkdir(path): path = path.strip() isExists = os.path.exists(path) if not isExists: print('创建文件夹：', path) os.
复制链接

扫一扫