实现批量下载 Ice-Tethered Profiler (ITP) 数据的 Python 脚本

文章介绍了如何使用Python脚本来批量下载由WoodsHoleOceanographicInstitution维护的Ice-TetheredProfiler(ITP)的海洋观测数据,这些数据对于研究北极气候变化至关重要。脚本通过HTTP请求获取数据链接,并利用InternetDownloadManager(IDM)进行下载。同时,文章详细阐述了ITP数据的三个处理级别及其用途。
摘要由CSDN通过智能技术生成

Ice-Tethered Profiler (ITP) 数据批量下载工具

Ice-Tethered Profiler (ITP) 数据是由 Woods Hole Oceanographic Institution (WHOI) 维护的。下面是 WHOI 的简介,摘自其官网

Woods Hole Oceanographic Institution is the world’s leading, independent non-profit organization dedicated to ocean research, exploration, and education. Our scientists and engineers push the boundaries of knowledge about the ocean to reveal its impacts on our planet and our lives. (https://www.whoi.edu/who-we-are/) (2023-7-3)

ITP Overview

Recent studies indicate that the Arctic may be both a sensitive indicator and an active agent of climate variability and change. While progress has been made in understanding the Arctic’s coupled atmosphere-ice-ocean system, documentation of its evolution has been hindered by a sparse data archive. This observational gap represents a critical shortcoming of the ‘global’ ocean observing system. Addressing this gap, a new instrument, the ‘Ice-Tethered Profiler’ (ITP) was conceived to repeatedly sample the properties of the ice-covered Arctic Ocean at high vertical resolution over time periods of up to three years. Analogous to the international Argo float program that is employing autonomous profiling floats to return real-time seawater property data from the temperate oceans, we are working together with fellow North American, European and Asian investigators to maintain a loose array of ITPs and other similar instruments throughout the ice-covered Arctic. We hope that the analysis of data from these instruments will lead to better understanding of the Arctic Ocean’s response and role in global climate change. (https://www2.whoi.edu/site/itp/) (2023-7-3)

How it Works

The ITP system consists of a small surface capsule that sits atop an ice flow and supports a plastic-jacketed wire rope tether that extends through the ice and down into the ocean, ending with a weight (intended to keep the wire vertical). A cylindrical underwater instrument (in shape and size much like an Argo float) mounts on this tether and cycles vertically along it, carrying oceanographic sensors through the water column. Water property data are telemetered from the ITP to shore in near-real time. (https://www2.whoi.edu/site/itp/) (2023-7-3)
Ice-Tethered Profiler (ITP) 结构和工作方式示意图

ITP Data Products

ITP 数据产品共有三个层次(Level),其中 Level 3 数据有两种形式(form)。关于数据及其处理流程的细节,参见官网。下面的内容在 2023 年 7 月 3 日摘自 https://www2.whoi.edu/site/itp/data/data-products/

LEVEL 1 RAW DATA are received from each ITP after each one-way profile is completed. The data are made available via the FTP site shortly after reception (typically within a few hours) by an automated routine. GPS position information for each system are received once per day and also made available via the ITP FTP site. The location data are unedited and unfiltered; the scientific and engineering data are extracted from the binary files transmitted by the ITPs and saved in MATLAB format (one file per profile), but no other processing, cleaning or smoothing is performed.

Level 1 profile data files from each ITP system have been compressed in two forms as files named itpNrawmat.tar.Z and itpNrawmat.zip, where N is the ITP system number. For detailed documentation of the data format, download the linked pdf files at right. The GPS position data are contained in ascii-format files named itpNrawlocs.dat. (2023-7-3)

LEVEL 2 REAL TIME DATA are created from the Level 1 raw data by automated routines. File updates occur several times per day. At this level of processing, the location data are filtered and interpolated to the times of each profile, while the scientific and engineering data are averaged in 2-db bins. No sensor response corrections, secondary calibration or editing are applied to these products. This form of the data are displayed graphically in the Status pages above. Level 2 data are compressed in two forms as files named itpNgriddata.tar.Z and itpNgriddata.zip, where N is the ITP system number. In addition, the most recently-acquired data from each ITP system are available on the FTP web site under the file name itpNlast.dat. For detailed documentation of the data format, download the linked pdf files at right. (2023-7-3)

LEVEL 3 ARCHIVE DATA are our best estimates of the ocean properties derived from the ITP sensor observations. These data have had sensor response corrections applied, regional conductivity adjustments made based on historical hydrographic data, and edits performed. Level 3 data products are derived for each ITP system after its mission has ended. A full description of the ITP data processing procedure is provided here. Level 3 data are available in two forms. The first are Matlab-format files (one file per profile) holding corrected data at the basic sensor sample rate. These files are compressed into files named itpNcormat.tar.Z and itpNcormat.zip. For detailed documentation of the data format, download the linked pdf files at right. The second form of Level 3 data have been pressure-bin-averaged at 1-db vertical resolution. Again, these data have been compressed and are available in files named itpNfinal.tar.Z and itpNfinal.zip. For detailed documentation of the data format, download the linked pdf files at right. This form of ITP data is being submitted to the national data archives. (2023-7-3)

How to acknowledge ITP data

We ask that the following acknowledgment be given when ITP data are used:

“The Ice-Tethered Profiler data were collected and made available by the Ice-Tethered Profiler Program (Toole et al., 2011; Krishfield et al., 2008) based at the Woods Hole Oceanographic Institution (https://www.whoi.edu/itp).”

If you are using ITP data, please provide us with a citation to include in our compilation of publications that utilize ITP data (contact us).

Data from ITPs (See the Technology section under Background for detailed description of the raw observations) at three levels of processing are available via FTP. (https://www2.whoi.edu/site/itp/data/) (2023-7-3)

ITP 数据批量下载 (Python) – 调用 Internet Download Manager (IDM)

下面的 Python 脚本首先对 ITP 数据网站发起 HTTP 请求,获取指定格式的(以 Level 3 Form 1 格式的数据为例。容易发现,这种格式的数据的下载地址均以 cormat.zip 结尾;我们将利用这个特征,通过构造正则表达式(regular expression),筛选出所有该格式的数据的下载地址。类似地,可修改脚本,实现对其他格式的数据的下载) ITP 数据的下载地址;然后以命令行方式调用 Internet Download Manager (IDM),将下载请求批量添加至 IDM 的下载队列(download queue)中。该脚本在 Python 3.12.1 版本下测试通过。

脚本运行完毕后,用户可进入 IDM 界面,在确认下载任务信息(特别是目标位置(Save To) 项)无误后,手动开始下载队列。

""" get ITP data
"""
import os
import re
import subprocess
import zipfile
import requests

# References
# [How to use Internet Download Manager-Command line](https://www.internetdownloadmanager.com/support/command_line.html)

def download_itp_data(data_name_list=None):
    """function downloading ITP data
    Args:
        data_name_list:   r"(.*cormat\.zip)" | r"(.*final\.mat)" | r"(.*final\.zip)"
    """
    if data_name_list is None:
        data_name_list = [r"(.*cormat\.zip)", r"(.*final\.mat)", r"(.*final\.zip)"]

    script_dir = os.path.split(os.path.realpath(__file__))[0]
    base_url = "https://scienceweb.whoi.edu/itp/data/"
    text = requests.get(base_url, timeout=(60,60)).text
    itp_name_list = [f for f in re.findall('<a href="(itpsys\\d+/)">.*</a>', text) if f != '../']
    for itp_name in itp_name_list:
        itp_url = base_url + itp_name
        print(itp_url)
        text = requests.get(itp_url, timeout=(60,60)).text
        for data_name in data_name_list:
            reg = f'<a href="{data_name}">.*</a>'
            filenames = [f for f in re.findall(reg, text) if f != '../']
            for file_name in filenames:
                src_url = itp_url + file_name
                local_path = os.path.normpath(script_dir+'\\..\\itp-data')
                file_name = file_name.replace('itp39_1','itp39')
                if os.path.isfile(local_path + "\\" + file_name):
                    continue
                # %idman% /d !SRC[%%i]! /p %local_path% /f %%i.pdf /n
                subprocess.run(['idman','/d',src_url,'/p',local_path,'/f',file_name,'/a','/n'],
                                check=False)

def unzip_itp_data(data_name_list=None):
    """ unzip cormat
    """
    if data_name_list is None:
        data_name_list=[r"cormat.zip", r"final.zip"]

    script_dir = os.path.split(os.path.realpath(__file__))[0]
    data_dir = os.path.normpath(script_dir+'\\..\\itp-data')
    # data_dir = os.path.normpath('code\\arctic_staircase\\lib\\itp-articuno-main\\itp-data')
    for data_name in data_name_list:
        zip_file_names = [f for f in os.listdir(data_dir) if f.endswith(f'{data_name}')] # cormat.zip
        for file_name in zip_file_names:
            zip_file_path = data_dir + '\\' + file_name
            unzip_dest_dir = zip_file_path[:-4]
            if not os.path.isdir(unzip_dest_dir):
                os.makedirs(unzip_dest_dir)
                print(f"os.makedirs({unzip_dest_dir})")
                zip_ref = zipfile.ZipFile(zip_file_path,'r')
                zip_ref.extractall(unzip_dest_dir)
                print(f"zip_ref.extractall({unzip_dest_dir})")
                zip_ref.close()

download_itp_data()
# unzip_itp_data([r"cormat.zip"]) # 在 ITP 数据下载完成后, 再执行该模块

作者简介

研究方向:大气与海洋科学
GitHub 主页:https://grwei.github.io/
E-mail:313017602@qq.com

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值