1. Introduction
CMIP6数据下载下来通常是这个样子:
(base) Inspiron-7590:/mnt/f/20210325_CMIP6/ssp585/ta# ll -rth
total 49G
-rwxrwxrwx 1 cuiyz cuiyz 793M Dec 25 2018 ta_Amon_IPSL-CM6A-LR_ssp585_r1i1p1f1_gr_201501-210012.nc
-rwxrwxrwx 1 cuiyz cuiyz 1.8G Mar 14 2019 ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_201501-205412.nc
-rwxrwxrwx 1 cuiyz cuiyz 1.8G Mar 14 2019 ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_205501-209412.nc
-rwxrwxrwx 1 cuiyz cuiyz 268M Mar 14 2019 ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_209501-210012.nc
-rwxrwxrwx 1 cuiyz cuiyz 481M May 1 2019 ta_Amon_CanESM5_ssp585_r1i1p1f1_gn_201501-210012.nc
-rwxrwxrwx 1 cuiyz cuiyz 447M May 1 2019 ta_Amon_CanESM5_ssp585_r1i1p1f1_gn_210101-218012.nc
-rwxrwxrwx 1 cuiyz cuiyz 669M May 1 2019 ta_Amon_CanESM5_ssp585_r1i1p1f1_gn_218101-230012.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_201501-202412.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_202501-203412.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_203501-204412.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_204501-205412.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_205501-206412.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_206501-207412.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_207501-208412.nc
-rwxrwxrwx 1 cuiyz cuiyz 143M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_208501-209412.nc
-rwxrwxrwx 1 cuiyz cuiyz 86M Jun 19 2019 ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_209501-210012.nc
..
可以发现:
- 不是所有模式一个文件即包含所有时间
- 不同模式文件,时间分割方式不同
因此,我们想自动化地处理这些文件,使每个模式最后仅有一个包含完整时间的文件,以提高后续工作的效率。
2. A Case
对一个模式的数据进行处理,是简单的,例如:
-rwxrwxrwx 1 cuiyz cuiyz 1.8G Mar 14 2019 ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_201501-205412.nc
-rwxrwxrwx 1 cuiyz cuiyz 1.8G Mar 14 2019 ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_205501-209412.nc
-rwxrwxrwx 1 cuiyz cuiyz 268M Mar 14 2019 ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_209501-210012.nc
CDO:cdo cat ta_Amon_BCC-CSM2-MR* ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_full-time.nc
或NCO:ncrcat -O ta_Amon_BCC-CSM2-MR* ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_full-time.nc
3. Batch
对每个模式都进行2中的操作显然是麻烦的,利用python + shell,可以实现批量处理。
首先,我们需要不重复地提取每个模式文件的前缀。不清楚在shell的实现,因此使用python完成。编写get_model_prefix.py
import os
files = os.listdir('./') # 当前目录下所有文件名
prefix = [] # 空list,存储模式前缀
for file in files:
if file.startswith('ta'):
lst = str.split(file, '_') # 以'_'为分隔符将文件名分段
prefix_now = file.replace(lst[-1],'') # 当前文件的前缀
if prefix_now not in prefix:
prefix.append(prefix_now) # 如果不和之前重复,则添加进prefix
for i in prefix:
print(i) # 输出
直接执行输出如下:
ta_Amon_ACCESS-CM2_ssp585_r1i1p1f1_gn_
ta_Amon_ACCESS-ESM1-5_ssp585_r1i1p1f1_gn_
ta_Amon_BCC-CSM2-MR_ssp585_r1i1p1f1_gn_
ta_Amon_CAMS-CSM1-0_ssp585_r1i1p1f1_gn_
ta_Amon_CanESM5_ssp585_r1i1p1f1_gn_
ta_Amon_CAS-ESM2-0_ssp585_r1i1p1f1_gn_
ta_Amon_CIESM_ssp585_r1i1p1f1_gr_
ta_Amon_CMCC-CM2-SR5_ssp585_r1i1p1f1_gn_
ta_Amon_CMCC-ESM2_ssp585_r1i1p1f1_gn_
ta_Amon_E3SM-1-1_ssp585_r1i1p1f1_gr_
ta_Amon_FGOALS-f3-L_ssp585_r1i1p1f1_gr_
ta_Amon_FGOALS-g3_ssp585_r1i1p1f1_gn_
ta_Amon_FIO-ESM-2-0_ssp585_r1i1p1f1_gn_
ta_Amon_GFDL-ESM4_ssp585_r1i1p1f1_gr1_
ta_Amon_IITM-ESM_ssp585_r1i1p1f1_gn_
ta_Amon_INM-CM4-8_ssp585_r1i1p1f1_gr1_
ta_Amon_INM-CM5-0_ssp585_r1i1p1f1_gr1_
ta_Amon_IPSL-CM6A-LR_ssp585_r1i1p1f1_gr_
ta_Amon_KACE-1-0-G_ssp585_r1i1p1f1_gr_
ta_Amon_KIOST-ESM_ssp585_r1i1p1f1_gr1_
ta_Amon_MIROC6_ssp585_r1i1p1f1_gn_
ta_Amon_MPI-ESM1-2-HR_ssp585_r1i1p1f1_gn_
ta_Amon_MPI-ESM1-2-LR_ssp585_r1i1p1f1_gn_
ta_Amon_NESM3_ssp585_r1i1p1f1_gn_
ta_Amon_TaiESM1_ssp585_r1i1p1f1_gn_
然后,就可以编写shell脚本进行循环:
#!/bin/bash
prefix=$(python get_model_prefix.py) #将get_model_prefix.py的输出写入prefix变量
echo $prefix
for i in ${prefix[@]}
do
{
cdo cat $i*.nc $i"full-time.nc" # 用nco也可
mv $i*.nc ./separate
} &
done
在存放模式文件的路径下执行这个脚本就大功告成了~
PS:如果有办法在shell实现get_model_prefix.py的话就更好了,只能说python使人变懒,我懒得去找shell的实现了。