2021-01-01 如何批量下载ERA5数据

最新推荐文章于 2024-10-03 11:50:42 发布

Mae_Liu

最新推荐文章于 2024-10-03 11:50:42 发布

阅读量7.8k

点赞数 12

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/Mae_Liu/article/details/112061755

版权

Python 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

由于笔者在工作中需要用到ERA5的雨水路径(Total column rain water)数据，因此需要批量下载该数据。本文将以下载Total column rain water为例，记录ERA5的批量下载（使用Python多进程并行下载）方法，以备日后查看。

本次示例的实现平台和工具：

操作系统：ubuntu20
下载工具：python3
特殊工具：ERA5官网指定的CDS API和密钥（详见后文）

注：本下载方案在windows和linux操作系统下均适用。

目录：
1 挑选需要的ERA5数据
1.1 进入官网挑选需要的ERA5 数据集
1.2 选择所需数据集中的具体数据（产品类型/变量/年/月/日/时刻/地理范围/数据类型）
2 下载ERA5数据
2.1 安装CDS API工具
2.2 获取ERA5数据的python 下载脚本
2.2.1 下载方法一：官网给出的python脚本
2.2.2 下载方法二：多进程并行下载的python脚本

正文如下：

1 挑选需要的ERA5数据

本文示例下载数据：

产品类型：ERA5再分析数据
变量：Total column rain water
时间：2018.07.04-2018.07.11时段内所有时刻
区域：全球
数据类型：grib

1.1 进入官网挑选需要的ERA5 数据集：https://cds.climate.copernicus.eu/#!/search?text=ERA5&type=dataset

1.2 选择所需数据集中的具体数据（产品类型/变量/年/月/日/时刻/地理范围/数据类型）
笔者根据需求，作出如下挑选：
在这里插入图片描述

2 下载ERA5数据
2.1 安装CDS API工具
CDS API完整安装教程详见：https://cds.climate.copernicus.eu/api-how-to
该网页详细列举了不同操作系统下CDS API的安装方法：
在这里插入图片描述简单来讲：
（1）每个已注册的用户可获得一个url和key,复制网页给出的url和key的代码，另存为指定目录下的“.cdsapirc”文件，一般放在计算机的用户目录下（linux系统：/home/用户名/，windows系统详见网页上的说明，好像是C盘的“用户”文件夹下）。
（2）在python环境下运行：pip install cdsapi,即可安装完毕。

2.2 获取ERA5数据的python 下载脚本
回到挑选数据的页面，完成挑选后点击"show API request"。即可获取官网示例的python下载脚本。
在这里插入图片描述
2.2.1 下载方法一：
直接使用官网示例的python下载脚本，它会将上述选取的所有数据下载到这一个名为"download.grib"的文件中，文件所在目录为当前此python脚本所在的目录。

2.2.2 下载方法二：
笔者编写的多进程并行下载的python脚本。此脚本会将每一个时刻的数据生成一个grib文件，放入脚本中指定的文件目录下。

"""
Created on Fri Jan  1 14:58:08 2021

@author: mae

This program download ERA5 rain water content during 2018.07.04-2017.07.11 
using python multi-threaded execution.

notice: 
       1. This script requires url and key which can be found at web page: 
          https://cds.climate.copernicus.eu/api-how-to
       2. after run the script, era5 data will be format individualy like:
          'era5_' + '201807' + d + h + '_rain_water_sfc.grib'
    
"""

import cdsapi
import os  
from multiprocessing import Process

c = cdsapi.Client()

def download(day, h):
    c.retrieve(
        'reanalysis-era5-single-levels',
        {
            'product_type':'reanalysis',
            'format': 'grib',
            'variable': 'total_column_rain_water',
            'year': '2018',
            'month': '07',
            'day': day,
            'time': h,
            },
        'era5_' + '201807' + d + h + '_rain_water_sfc.grib')
    
    return 0
               
if __name__ == '__main__':

    # define directory where you want data store in.
    os.chdir("/where/data/put/in/")

    days=['04', '05', '06','07', '08', '09','10', '11']
    times=[
            '00:00', '01:00', '02:00',
            '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00',
            '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00',
            '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00',
            '21:00', '22:00', '23:00']

    for day in days:
        d = day
        pro_list = []
            
        for hour in times:
            h = hour
                
            p = Process(target=download, args=(day, h))
            p.start()
            pro_list.append(p)
                
        for p in pro_list:
            p.join()
            print("主进程结束!")
    del p

注：生成的文件名类似这样：era5_2018070400:00_rain_water_sfc.grib
有一个需要注意的小问题：文件名中含有冒号，linux系统是可以接受的，但windows系统好像不能接受文件名中含有冒号。解决方案(1)：linux系统下打开终端，进入文件所在目录，运行

rename 's|:||g' *

即可将冒号删去。解决方案(2)：window也有类似的批处理命令，有兴趣可以自行探索。解决方案(3)：将笔者写的python代码稍作修改，可以改一下文件命名方式。：）

整理完毕！

更新2021.02.03

本次更新添加了以上“解决方案(3)”,下载“Super_Typhoon_Lekima”期间的再分析数据。以下内容包含下载高空气象要素和近地面气象要素两个部分。

高空气象要素下载：

"""
Created on Tue Feb  2 11:23:31 2021

@author: mae

This program downloads reanalysis-era5-pressure-levels variables during Super_Typhoon_Lekima.

"""

import re
import cdsapi
import os  
from multiprocessing import Process

c = cdsapi.Client()

def time2str(time):
    str_t = str(time)
    numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
    time_h  = int(numbers[0])
    time_m = int(numbers[1])
    time_mi = str(time_m).zfill(2)
    time_ho = str(time_h).zfill(2)
    time_str = time_ho+time_mi
    
    return time_str

def download(day, h):
    
    def time2str(time):
        str_t = str(time)
        numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
        time_h  = int(numbers[0])
        time_m = int(numbers[1])
        time_mi = str(time_m).zfill(2)
        time_ho = str(time_h).zfill(2)
        time_str = time_ho+time_mi
        
        return time_str

    c.retrieve(
        'reanalysis-era5-pressure-levels',
    {
        'product_type': 'reanalysis',
        'format': 'grib',
        'year': '2019',
        'month': '08',
        'day': day,
        'pressure_level': [
            '1', '2', '3',
            '5', '7', '10',
            '20', '30', '50',
            '70', '100', '125',
            '150', '175', '200',
            '225', '250', '300',
            '350', '400', '450',
            '500', '550', '600',
            '650', '700', '750',
            '775', '800', '825',
            '850', '875', '900',
            '925', '950', '975',
            '1000',
        ],
        'time': h,
        'variable': [
            'fraction_of_cloud_cover', 'relative_humidity', 'specific_cloud_ice_water_content',
            'specific_cloud_liquid_water_content', 'specific_humidity', 'specific_rain_water_content',
            'specific_snow_water_content', 'temperature',
        ],
    },
        'era5_' + '201908' + d + '_'+ time2str(h) + '_plv.grib')
    
    return 0
               
if __name__ == '__main__':

    # define directory in which data shall be stored
    os.chdir("/media/mae/Backup Plus/Typhoon/Super_Typhoon_Lekima/plv/")

    days=['04', '05', '06','07', '08', '09','10', '11','12','13']
    times=[
            '00:00', '01:00', '02:00',
            '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00',
            '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00',
            '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00',
            '21:00', '22:00', '23:00']
    
    for day in days:
        d = day
        pro_list = []
            
        for hour in times:
            h = hour
        
            p = Process(target=download, args=(day, h))
            p.start()
            pro_list.append(p)
                
        for p in pro_list:
            p.join()
            print("主进程结束!")
    del p

近地面气象要素下载：

"""
Created on Wed Feb  3 11:15:22 2021

@author: mae


This program downloads reanalysis-era5-single-levels variables during Super_Typhoon_Lekima.

"""

import re
import cdsapi
import os  
from multiprocessing import Process

c = cdsapi.Client()

def time2str(time):
    str_t = str(time)
    numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
    time_h  = int(numbers[0])
    time_m = int(numbers[1])
    time_mi = str(time_m).zfill(2)
    time_ho = str(time_h).zfill(2)
    time_str = time_ho+time_mi
    
    return time_str

def download(day, h):
    
    def time2str(time):
        str_t = str(time)
        numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
        time_h  = int(numbers[0])
        time_m = int(numbers[1])
        time_mi = str(time_m).zfill(2)
        time_ho = str(time_h).zfill(2)
        time_str = time_ho+time_mi
        
        return time_str

    c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type': 'reanalysis',
        'format': 'grib',
        'variable': [
            '10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature',
            'skin_temperature', 'surface_pressure', 'total_column_cloud_ice_water',
            'total_column_cloud_liquid_water', 'total_column_rain_water', 'total_column_snow_water',
            'total_column_water_vapour', 'total_precipitation',
        ],
        'year': '2019',
        'month': '08',
        'day': day,
        'time': h
    },
        'era5_' + '201908' + d + '_'+ time2str(h) + '_sfc.grib') 
    
    return 0
               
if __name__ == '__main__':

    # define directory in which data shall be stored
    os.chdir("/media/mae/Backup Plus/Typhoon/Super_Typhoon_Lekima/sfc/")

    days=['04', '05', '06','07', '08', '09','10', '11','12','13']
    times=[
            '00:00', '01:00', '02:00',
            '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00',
            '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00',
            '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00',
            '21:00', '22:00', '23:00']
    
    for day in days:
        d = day
        pro_list = []
            
        for hour in times:
            h = hour
        
            p = Process(target=download, args=(day, h))
            p.start()
            pro_list.append(p)
                
        for p in pro_list:
            p.join()
            print("主进程结束!")
    del p