AERONET AOD 数据自动化下载 + PYTHON + SELENIUM

目录

5.05更新 增加FMF、SSA数据下载(见GitHub)

4.10更新 通过CURL、WGET等方式下载目标数据

获取下载地址

Using Tools to Save Web Output as a File

Wget

Curl

AERONET AOD 数据下载

 利用 PYTHON + SELENIUM 自动化下载中国站点数据

获得站点URL列表

 获取站点数据时间

下载数据


写在前面的小结:预检索动态资源,正则化工具匹配标签。

完整代码见:SakuraSong001/spider4remotedata (github.com)

项目另含哨兵五号批量自动化下载工具。

7.21增加多线程并行下载和selenium后台运行。

5.05更新 增加FMF、SSA数据下载(见GitHub)

针对SSA、FMF等产品的网站网页解析进行修改,实现自动化批量下载。

4.10更新 通过CURL、WGET等方式下载目标数据

AErosol RObotic NETwork (AERONET)下载工具提供现成的下载地址编辑规则如下所示。检索结果为HTML格式,可根据需要生成下载链接,结果格式需要再一次转换。

AOD AND SDA : Data - Aerosol Robotic Network (AERONET) Homepage (nasa.gov)

获取下载地址

AOD、SDA、TOT等数据下载地址的可选参数如下表所示,可利用网站地图工具和BeautifulSoup工具,获得目标区域网站列表,根据地址规则生成下载地址列表后,通过curl或wget工具实现批量下载。

Table 1: Explanation and Values for Mandatory and Optional Web Service Parameters

Mandatory ParametersExplanationValues
year,month,dayStarting time moment (year= 1992 to present), (month=1 to 12), (day = 1 to max num, depends on month)

Year: 1993 to present (must be 4-digits)

Month: 1 to 12

Day: 1 to max_day_of_month

AVGData Format

All points: AVG=10

Daily average: AVG=20

[data_type]Data Types (See Table 2)[data_type]=1
Optional Parameters  
year2,month2,day2Ending time moment**

Year: 1993 to present (must be 4-digits)

Month: 1 to 12

Day: 1 to max_day_of_month

**if year2,month2, and day2 are omitted, then the current day is assumed

hour, hour2Specified beginning (hour) and ending hour (hour2)

Hour: 0 to 23

if not specified, then the hour is set to zero; time2 is incremented to next day and hour2=0

site

AERONET site name

Exact match of AERONET database name

If none specified, then all sites are searched for data during the time interval specified

AERONET Site Name List

lat1,lon1,lat2,lon2Bounding Box **

lat1,lon1 - Lower Left
lat2,lon2 - Upper Right

**values must be in decimal degrees (including the decimal)

lunar_mergeEnable Lunar AOD (Provisional) Only Download

0 - No Lunar
1- Lunar Data (Provisional)

if_no_htmlDetermine whether html formatting is printed0 - HTML formatting printed (default)
1 - No HTML formatting printed

Table 2: Explanation of Data Types for the Web Service

Data TypesExplanation
AOD10Aerosol Optical Depth Level 1.0
AOD15Aerosol Optical Depth Level 1.5
AOD20Aerosol Optical Depth Level 2.0
SDA10SDA Retrieval Level 1.0
SDA15SDA Retrieval Level 1.5
SDA20SDA Retrieval Level 2.0
TOT10Total Optical Depth based on AOD Level 1.0 (all points only)
TOT15Total Optical Depth based on AOD Level 1.5 (all points only)
TOT20Total Optical Depth based on AOD Level 2.0 (all points only)

Using Tools to Save Web Output as a File

Wget

wget --no-check-certificate  -q  -O test.out "https://aeronet.gsfc.nasa.gov/cgi-bin/print_web_data_v3?site=Cart_Site&year=2000&month=6&day=1&year2=2000&month2=6&day2=14&AOD15=1&AVG=10"

Curl

curl -s -k -o test.out "https://aeronet.gsfc.nasa.gov/cgi-bin/print_web_data_v3?site=Cart_Site&year=2000&month=6&day=1&year2=2000&month2=6&day2=14&AOD15=1&AVG=10"

AERONET AOD 数据下载

AErosol RObotic NETwork (AERONET)是由NASA 和 LOA-PHOTONS (CNRS) 联合建立的地基气溶胶遥感观测网,提供对不同气溶胶状态下的光谱气溶胶光学深度(AOD),反演产物和可沉淀水的全球分布式观测。现行版本 3 AOD 数据提供:级别 1.0(未筛选)、级别 1.5(云筛选和质量控制)和级别 2.0(质量保证)。

官网地址:Aerosol Robotic Network (AERONET) Homepage (nasa.gov) 

Aeronet 网站支持灵活筛选条件,可根据需求下载特定时间、级别、站点的数据。同时可通过网页提供的筛选功能筛选符合特定条件的数据。

例如下载 2012年 Alboran 站点的 AOD1.5 数据,点击 2012 - level 1.5 - Alboran 进入站点数据详情页面,点击 AOD Level 1.5 进入数据请求下载页面,点击 Accept 即可下载数据。

 利用 PYTHON + SELENIUM 自动化下载中国站点数据

获得站点URL列表

首先通过网站提供的地图筛选工具,大致选择中国范围,并将页面另存为本地HTML文件,利用BEAUTIFULSOUP解析页面获得站点列表。利用正则化工具筛选、获得所有站点URL,并通过父节点获得站点名称、经纬度等信息。

def get_stations(area_file):
    result = []
    pattern = r'https\:\/\/aeronet\.gsfc\.nasa\.gov\/cgi\-bin\/data\_display\_aod\_v3\?site\=.+'

    # 本地页面
    chinaAreaPage = r'AERONET Data Display Interface - WWW DEMONSTRAT.html'
    soup = BeautifulSoup(open(chinaAreaPage, 'r', encoding='utf-8').read(), 'html.parser')
    aList = soup.find_all('a')
    for item in aList:
        sHref = item.get('href')
        if re.match(pattern, str(sHref)):
            station = re.sub(r'\n', '', item.get_text())
            geoInfo = re.sub(r'\n+.+\(\s', r'(', item.parent.get_text())

            response = session.get(sHref, headers=header)
            beautifulSoup = BeautifulSoup(response.text, 'html.parser')

            pageUrl = beautifulSoup.find('a', text=re.compile(r'More AERONET Downloadable Products\.{3}')).get('href')

            date = beautifulSoup.find(text=re.compile(r'Start Date.+')).split('-')
            start_year = re.sub(r'\;.+', '', date[2])
            latest_year = date[4]

            result.append([station, geoInfo, pageUrl, start_year, latest_year])
    # print(result, file=open(area_file, 'w', encoding='utf-8'))
    result = np.array(result)
    dataframe = pd.DataFrame(
        {'station': result[:, 0], 'geoInfo': result[:, 1], 'pageUrl': result[:, 2], 'start_year': statList[:, 3],
         'latest_year': result[:, 4]}
    )
    dataframe.to_csv(area_file, index=False, sep=',', encoding='utf-8')
    return result

 获取站点数据时间

根据给定时间范围筛选站点、并获取站点数据的有效时间节点。

if '2005' <= first <= '2012' or '2005' <= latest <= '2012' or (first <= '2005' and latest >= '2012'):# 不在此时间范围内的直接跳过
    print('\n')
    begin = end = '0'

    statUrl = 'https://aeronet.gsfc.nasa.gov/cgi-bin/' + statHref.replace('®', '&re')
    driver.get(statUrl)
    time.sleep(3)
    ele = driver.find_element(By.XPATH, '//*[@id="Year1"]')
    options = ele.find_elements(By.TAG_NAME, 'option')
    for option in options:
        # print(option.get_attribute('value'))
        # print(option.text)
        if option.text >= '2013':
            break
        if begin == '0' and '2005' <= option.text:
            begin = option.text
        if end < option.text:
            end = option.text
    # -- end for options --
    if begin == '0' or end == '0':
        print('wrong time', stat, first, latest, begin, end, statUrl)
        continue
    print(stat, first, latest, begin, end, statUrl)

下载数据

由于aeronet网站的特殊设置,部分数据需要先检索才可以下载,否则按照正确的下载地址也会提示 HTTPError: 404 not found。

# 模拟检索动作
statUrl = 'https://aeronet.gsfc.nasa.gov/cgi-bin/' + statHref.replace('®', '&re')
driver.get(statUrl)
time.sleep(3)
select1 = Select(driver.find_element(By.XPATH, '//*[@id="Year1"]'))
select1.select_by_visible_text(str(year))
select2 = Select(driver.find_element(By.XPATH, '//*[@id="Year2"]'))
select2.select_by_visible_text(str(year))
try:
    aod15_checkbox = driver.find_element(By.NAME, 'AOD15')
    aod15_checkbox.click()
except NoSuchElementException:
    print('No such aod 1.5 ', year, url)
    break
submit = driver.find_element(By.NAME, 'Submit')
submit.click()
time.sleep(30)
# download file
for year in range(int(begin), int(end) + 1):
    # print(year)
    filename = '{0}0101_{0}1231_{1}.zip'.format(year, stat)
    filepath = r'F:\WORKSPACE\DBN-PARASOL\aeronet data 1.5\{}'.format(filename)
    url = 'https://aeronet.gsfc.nasa.gov/zip_files_v3/{}'.format(filename)
    if os.path.exists(filepath):
        print('exist ', year, url)
        continue

    try:
        opener = urllib.request.build_opener()
        opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.46'), ('Cookie', '_ga=GA1.2.479127296.1609393316; _ga=GA1.4.479127296.1609393316'),('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9')]
        urllib.request.install_opener(opener)
        urllib.request.urlretrieve(url, filepath)
        print('Done ', year, url)
    except urllib.error.HTTPError:
        print('Not Found', year, url)
        print(stat, year, url, file=open(r'F:\WORKSPACE\DBN-PARASOL\aeronet data 1.5\notfound.txt', 'a', encoding='utf-8'))

批量自动加载结果: 

T he concept and description of a remote sensing aerosol monitoring network initiated by NASA, developed to sup- port NASA, CNES, and NASDA’s Earth satellite systems under the name AERONET and expanded by national and international collaboration, is described. Recent de- velopment of weather-resistant automatic sun and sky scanning spectral radiometers enable frequent measure- ments of atmospheric aerosol optical properties and pre- cipitable water at remote sites. Transmission of automatic measurements via the geostationary satellites GOES and METEOSATS’ Data Collection Systems allows reception and processing in near real-time from approximately 75% of the Earth’s surface and with the expected addition of GMS, the coverage will increase to 90% in 1998. NASA developed a UNIX-based near real-time processing, display and analysis system providing internet access to the emerg- ing global database. Information on the system is avail- able on the project homepage, http://spamer.gsfc.nasa.gov. The philosophy of an open access database, centralized processing and a user-friendly graphical interface has contributed to the growth of international cooperation for ground-based aerosol monitoring and imposes a stan- dardization for these measurements. The system’s auto- matic data acquisition, transmission, and processing fa- cilitates aerosol characterization on local, regional, and global scales with applications to transport and radiation budget studies, radiative transfer-modeling and valida- tion of satellite aerosol retrievals. This article discusses the operation and philosophy of the monitoring system, the precision and accuracy of the measuring radiometers, a brief description of the processing system, and access to the database. Elsevier Science Inc., 1998
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值