由于笔者在工作中需要用到ERA5的雨水路径(Total column rain water)数据,因此需要批量下载该数据。本文将以下载Total column rain water为例,记录ERA5的批量下载(使用Python多进程并行下载)方法,以备日后查看。
本次示例的实现平台和工具:
操作系统:ubuntu20
下载工具:python3
特殊工具:ERA5官网指定的CDS API和密钥(详见后文)
注:本下载方案在windows和linux操作系统下均适用。
目录:
1 挑选需要的ERA5数据
1.1 进入官网挑选需要的ERA5 数据集
1.2 选择所需数据集中的具体数据(产品类型/变量/年/月/日/时刻/地理范围/数据类型)
2 下载ERA5数据
2.1 安装CDS API工具
2.2 获取ERA5数据的python 下载脚本
2.2.1 下载方法一:官网给出的python脚本
2.2.2 下载方法二:多进程并行下载的python脚本
正文如下:
1 挑选需要的ERA5数据
本文示例下载数据:
产品类型:ERA5再分析数据
变量:Total column rain water
时间:2018.07.04-2018.07.11时段内所有时刻
区域:全球
数据类型:grib
1.1 进入官网挑选需要的ERA5 数据集:https://cds.climate.copernicus.eu/#!/search?text=ERA5&type=dataset
1.2 选择所需数据集中的具体数据(产品类型/变量/年/月/日/时刻/地理范围/数据类型)
笔者根据需求,作出如下挑选:
2 下载ERA5数据
2.1 安装CDS API工具
CDS API完整安装教程详见:https://cds.climate.copernicus.eu/api-how-to
该网页详细列举了不同操作系统下CDS API的安装方法:
简单来讲:
(1)每个已注册的用户可获得一个url和key,复制网页给出的url和key的代码,另存为指定目录下的“.cdsapirc”文件,一般放在计算机的用户目录下(linux系统:/home/用户名/,windows系统详见网页上的说明,好像是C盘的“用户”文件夹下)。
(2)在python环境下运行:pip install cdsapi
,即可安装完毕。
2.2 获取ERA5数据的python 下载脚本
回到挑选数据的页面,完成挑选后点击"show API request"。即可获取官网示例的python下载脚本。
2.2.1 下载方法一:
直接使用官网示例的python下载脚本,它会将上述选取的所有数据下载到这一个名为"download.grib"的文件中,文件所在目录为当前此python脚本所在的目录。
2.2.2 下载方法二:
笔者编写的多进程并行下载的python脚本。此脚本会将每一个时刻的数据生成一个grib文件,放入脚本中指定的文件目录下。
"""
Created on Fri Jan 1 14:58:08 2021
@author: mae
This program download ERA5 rain water content during 2018.07.04-2017.07.11
using python multi-threaded execution.
notice:
1. This script requires url and key which can be found at web page:
https://cds.climate.copernicus.eu/api-how-to
2. after run the script, era5 data will be format individualy like:
'era5_' + '201807' + d + h + '_rain_water_sfc.grib'
"""
import cdsapi
import os
from multiprocessing import Process
c = cdsapi.Client()
def download(day, h):
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type':'reanalysis',
'format': 'grib',
'variable': 'total_column_rain_water',
'year': '2018',
'month': '07',
'day': day,
'time': h,
},
'era5_' + '201807' + d + h + '_rain_water_sfc.grib')
return 0
if __name__ == '__main__':
# define directory where you want data store in.
os.chdir("/where/data/put/in/")
days=['04', '05', '06','07', '08', '09','10', '11']
times=[
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00']
for day in days:
d = day
pro_list = []
for hour in times:
h = hour
p = Process(target=download, args=(day, h))
p.start()
pro_list.append(p)
for p in pro_list:
p.join()
print("主进程结束!")
del p
注:生成的文件名类似这样:era5_2018070400:00_rain_water_sfc.grib
有一个需要注意的小问题:文件名中含有冒号,linux系统是可以接受的,但windows系统好像不能接受文件名中含有冒号。解决方案(1):linux系统下打开终端,进入文件所在目录,运行
rename 's|:||g' *
即可将冒号删去。解决方案(2):window也有类似的批处理命令,有兴趣可以自行探索。解决方案(3):将笔者写的python代码稍作修改,可以改一下文件命名方式。:)
整理完毕!
更新2021.02.03
本次更新添加了以上“解决方案(3)”,下载“Super_Typhoon_Lekima”期间的再分析数据。以下内容包含下载高空气象要素和近地面气象要素两个部分。
高空气象要素下载:
"""
Created on Tue Feb 2 11:23:31 2021
@author: mae
This program downloads reanalysis-era5-pressure-levels variables during Super_Typhoon_Lekima.
"""
import re
import cdsapi
import os
from multiprocessing import Process
c = cdsapi.Client()
def time2str(time):
str_t = str(time)
numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
time_h = int(numbers[0])
time_m = int(numbers[1])
time_mi = str(time_m).zfill(2)
time_ho = str(time_h).zfill(2)
time_str = time_ho+time_mi
return time_str
def download(day, h):
def time2str(time):
str_t = str(time)
numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
time_h = int(numbers[0])
time_m = int(numbers[1])
time_mi = str(time_m).zfill(2)
time_ho = str(time_h).zfill(2)
time_str = time_ho+time_mi
return time_str
c.retrieve(
'reanalysis-era5-pressure-levels',
{
'product_type': 'reanalysis',
'format': 'grib',
'year': '2019',
'month': '08',
'day': day,
'pressure_level': [
'1', '2', '3',
'5', '7', '10',
'20', '30', '50',
'70', '100', '125',
'150', '175', '200',
'225', '250', '300',
'350', '400', '450',
'500', '550', '600',
'650', '700', '750',
'775', '800', '825',
'850', '875', '900',
'925', '950', '975',
'1000',
],
'time': h,
'variable': [
'fraction_of_cloud_cover', 'relative_humidity', 'specific_cloud_ice_water_content',
'specific_cloud_liquid_water_content', 'specific_humidity', 'specific_rain_water_content',
'specific_snow_water_content', 'temperature',
],
},
'era5_' + '201908' + d + '_'+ time2str(h) + '_plv.grib')
return 0
if __name__ == '__main__':
# define directory in which data shall be stored
os.chdir("/media/mae/Backup Plus/Typhoon/Super_Typhoon_Lekima/plv/")
days=['04', '05', '06','07', '08', '09','10', '11','12','13']
times=[
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00']
for day in days:
d = day
pro_list = []
for hour in times:
h = hour
p = Process(target=download, args=(day, h))
p.start()
pro_list.append(p)
for p in pro_list:
p.join()
print("主进程结束!")
del p
近地面气象要素下载:
"""
Created on Wed Feb 3 11:15:22 2021
@author: mae
This program downloads reanalysis-era5-single-levels variables during Super_Typhoon_Lekima.
"""
import re
import cdsapi
import os
from multiprocessing import Process
c = cdsapi.Client()
def time2str(time):
str_t = str(time)
numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
time_h = int(numbers[0])
time_m = int(numbers[1])
time_mi = str(time_m).zfill(2)
time_ho = str(time_h).zfill(2)
time_str = time_ho+time_mi
return time_str
def download(day, h):
def time2str(time):
str_t = str(time)
numbers = [float(s) for s in re.findall(r'-?\d+\.?\d*', str_t)]
time_h = int(numbers[0])
time_m = int(numbers[1])
time_mi = str(time_m).zfill(2)
time_ho = str(time_h).zfill(2)
time_str = time_ho+time_mi
return time_str
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'format': 'grib',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature',
'skin_temperature', 'surface_pressure', 'total_column_cloud_ice_water',
'total_column_cloud_liquid_water', 'total_column_rain_water', 'total_column_snow_water',
'total_column_water_vapour', 'total_precipitation',
],
'year': '2019',
'month': '08',
'day': day,
'time': h
},
'era5_' + '201908' + d + '_'+ time2str(h) + '_sfc.grib')
return 0
if __name__ == '__main__':
# define directory in which data shall be stored
os.chdir("/media/mae/Backup Plus/Typhoon/Super_Typhoon_Lekima/sfc/")
days=['04', '05', '06','07', '08', '09','10', '11','12','13']
times=[
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00']
for day in days:
d = day
pro_list = []
for hour in times:
h = hour
p = Process(target=download, args=(day, h))
p.start()
pro_list.append(p)
for p in pro_list:
p.join()
print("主进程结束!")
del p
参考文章:
[1] https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form
[2] https://cds.climate.copernicus.eu/api-how-to
[3] https://blog.csdn.net/luqialiu3392/article/details/109895064
[4] https://blog.csdn.net/weixin_44975806/article/details/100083897