The Storm EVent ImagRy (SEVIR) dataset 是由时空匹配过的卫星(GOES-16)和雷达数据( NEXRAD weather radars)(包含天气现象时的),注册AWS可以下载。
下载地址: https://registry.opendata.aws/sevir/
教程 Tutorial: https://nbviewer.jupyter.org/github/MIT-AI-Accelerator/eie-sevir/blob/master/examples/SEVIR_Tutorial.ipynb
SEVIR Challenges (BETA): https://sevir.mit.edu
数据格式为: 4 hours 数据块, 5分钟间隔,空间范围: 384 km x 384 ,美国上空是5种探测器
五种探测器如下:
主要是GOES16的三个通道图,以及其上的GLM flashes(闪电)监测。以及地基radar数据,除了lght以外都是 raster (image) types,数据为L x L x 49,L为数据尺寸,49为时间维度(240/5+1=49)
lght 格式为NX5矩阵,包含时间,位置 等
NOAA风暴事件数据库
SEVIR中的许多事件都可以使用国家气象局(NS)收集的数据进行增强。大约20%的SEVIR例是根据美国国家气象局(NWS)标记的“风暴事件”进行选择的,
这些事件在这里存档:https://www.ncdc.noaa.gov/stormevents。此数据库包括额外的信息,包括:恶劣天气的类型(如大风、番茄、冰雹等)。
例如,下面是与图1所示事件相关的NWS风暴事件数据库的一个条目。点击
文本里有时间位置等信息描述
每一个事件在SEVIR中都有一个id,id以S开头与NWS风暴事件相关。R开头则是随机选取的与NWS无关
Downloading SEVIR
如果有亚马逊云,是直接使用:
aws s3 sync --no-sign-request s3://sevir .
下载指定 modalitiy,例如 vil
,执行
aws s3 cp --no-sign-request s3://sevir/CATALOG.csv CATALOG.csv
aws s3 sync --no-sign-request s3://sevir/data/vil .
数据描述:
SEVIR 包含以下两部分:
- Catalog: A CSV file with rows describing the metadata of an event
- Data Files: A set of HDF5 files containing events for a certain sensor type
注意:在SEVIR可能会有额外的风暴事件id与4小时事件重叠。目录将只提供一个这样的id。通过将风暴事件的时间和地点与SEVIR事件的元数据进行比较,可以找到其他的数据。
SEVIR所有的投影(目前)都是Lambert Azmuthial等面积。如果将来的数据使用不同的投影。
H5文件结构
data
+-- img_type
| +-- year
文件名
SEVIR_<IMG_TYPE>_<EVENT_TYPE>_<YEAR>_<START>_<END>.h5
其中
<TYPE>
是图像or 他探测器 (vil
, vis
, ir069
, etc..),
<EVENT_TYPE>
is either STORMEVENTS
, if the events are linked with NWS storm events, RANDOMEVENTS
, if the events are randomly sampled, or ALLEVENTS
if the two types are mixed into the same file.
<YEAR>_<START>_<END>
Denotes the year, start date and end date of the events in the file.
Each file contains the id
's of the events, as well as the event data. For example a summary of one such file is:
$ h5ls SEVIR_VIL_STORMEVENTS_2018_0101_0630.h5 id Dataset {793} vil Dataset {793, 384, 384, 49}
All raster types in SEVIR (vis
, ir069
, ir107
& vil
) store the data 4D tensors with shapes N x L x W x T
, where N
is the number of events in the file, LxL
is the image size (see patch size column in Table 1), and T
is the number of time steps in the image sequence.
Lightning Data
数据
本节包括处理SEVIR数据的代码示例。SEVIR数据可以从本地存储或通过Amazon S3云存储下载和访问。本教程将假定SEVIR数据已下载到本地或网络文件系统。下面的例子需要设置以下变量:
# Enter path to the SEVIR data location DATA_PATH = '../data' CATALOG_PATH = '../CATALOG.csv' # On some Linux systems setting file locking to false is also necessary: import os os.environ["HDF5_USE_FILE_LOCKING"]='FALSE'
下面的代码展示了如何直接打开H5数据文件,而不用使用目录来可视化使用h5py模块捕获的单个类型事件的帧:
import os
import h5py # needs conda/pip install h5py
import matplotlib.pyplot as plt
file_index = 0
with h5py.File('%s/vil/2018/SEVIR_VIL_STORMEVENTS_2018_0101_0630.h5' % DATA_PATH,'r') as hf:
event_id = hf['id'][file_index]
vil = hf['vil'][file_index]
print('Event ID:',event_id)
print('Image shape:',vil.shape)
fig,axs=plt.subplots(1,4,figsize=(10,5))
axs[0].imshow(vil[:,:,10])
axs[1].imshow(vil[:,:,20])
axs[2].imshow(vil[:,:,30])
axs[3].imshow(vil[:,:,40])
plt.show()
Event ID: b'S728503' Image shape: (384, 384, 49)
下面的代码展示了如何提取由四种图像类型(vis、vil、ir069和ir107)观察到的事件
import h5py # needs conda/pip install h5py
import matplotlib.pyplot as plt
import pandas as pd
# Read catalog
catalog = pd.read_csv(CATALOG_PATH,parse_dates=['time_utc'],low_memory=False)
# Desired image types
img_types = set(['vis','ir069','ir107','vil'])
# Group by event id, and filter to only events that have all desired img_types
events = catalog.groupby('id').filter(lambda x: img_types.issubset(set(x['img_type']))).groupby('id')
event_ids = list(events.groups.keys())
print('Found %d events matching' % len(event_ids),img_types)
# Grab a sample event and view catalog entries
sample_event = events.get_group( event_ids[-1])
print('Sample Event:',event_ids[-1])
sample_event
Found 12739 events matching {'ir069', 'ir107', 'vis', 'vil'}
Sample Event: S858968
这个数据帧包含SEVIR中对应于列出的id的每种图像类型的行。下面的代码使用sample_event的行来提取每种类型的图像数据:
def read_data( sample_event, img_type, data_path=DATA_PATH ):
"""
Reads single SEVIR event for a given image type.
Parameters
----------
sample_event pd.DataFrame
SEVIR catalog rows matching a single ID
img_type str
SEVIR image type
data_path str
Location of SEVIR data
Returns
-------
np.array
LxLx49 tensor containing event data
"""
fn = sample_event[sample_event.img_type==img_type].squeeze().file_name
fi = sample_event[sample_event.img_type==img_type].squeeze().file_index
with h5py.File(data_path + '/' + fn,'r') as hf:
data=hf[img_type][fi]
return data
vis = read_data(sample_event, 'vis')
ir069 = read_data(sample_event, 'ir069')
ir107 = read_data(sample_event, 'ir107')
vil = read_data(sample_event, 'vil')
# plot a frame from each img_type
fig,axs = plt.subplots(1,4,figsize=(10,5))
frame_idx = 30
axs[0].imshow(vis[:,:,frame_idx]), axs[0].set_title('VIS')
axs[1].imshow(ir069[:,:,frame_idx]), axs[1].set_title('IR 6.9')
axs[2].imshow(ir107[:,:,frame_idx]), axs[2].set_title('IR 10.7')
axs[3].imshow(vil[:,:,frame_idx]), axs[3].set_title('VIL')
(<matplotlib.image.AxesImage at 0x7f6a11ab79e8>, Text(0.5, 1.0, 'VIL'))
、
Lightning
为了将闪电包含在的数据中,我们首先需要将其转换为栅格类型,以便将其与其他图像类型一起显示。下面的代码通过将flash数据转换为每5分钟帧中每像素的flash计数来实现这一点:
import numpy as np
def lght_to_grid(data):
"""
Converts SEVIR lightning data stored in Nx5 matrix to an LxLx49 tensor representing
flash counts per pixel per frame
Parameters
----------
data np.array
SEVIR lightning event (Nx5 matrix)
Returns
-------
np.array
LxLx49 tensor containing pixel counts
"""
FRAME_TIMES = np.arange(-120.0,125.0,5) * 60 # in seconds
out_size = (48,48,len(FRAME_TIMES))
if data.shape[0]==0:
return np.zeros(out_size,dtype=np.float32)
# filter out points outside the grid
x,y=data[:,3],data[:,4]
m=np.logical_and.reduce( [x>=0,x<out_size[0],y>=0,y<out_size[1]] )
data=data[m,:]
if data.shape[0]==0:
return np.zeros(out_size,dtype=np.float32)
# Filter/separate times
# compute z coodinate based on bin locaiton times
t=data[:,0]
z=np.digitize(t,FRAME_TIMES)-1
z[z==-1]=0 # special case: frame 0 uses lght from frame 1
x=data[:,3].astype(np.int64)
y=data[:,4].astype(np.int64)
k=np.ravel_multi_index(np.array([y,x,z]),out_size)
n = np.bincount(k,minlength=np.prod(out_size))
return np.reshape(n,out_size).astype(np.float32)
def read_lght_data( sample_event, data_path=DATA_PATH ):
"""
Reads lght data from SEVIR and maps flash counts onto a grid
Parameters
----------
sample_event pd.DataFrame
SEVIR catalog rows matching a single ID
data_path str
Location of SEVIR data
Returns
-------
np.array
LxLx49 tensor containing pixel counts for selected event
"""
fn = sample_event[sample_event.img_type=='lght'].squeeze().file_name
id = sample_event[sample_event.img_type=='lght'].squeeze().id
with h5py.File(data_path + '/' + fn,'r') as hf:
data = hf[id][:]
return lght_to_grid(data)
lght = read_lght_data(sample_event)
# include lightning counts in plot
fig,axs = plt.subplots(1,5,figsize=(14,5))
frame_idx = 30
axs[0].imshow(vis[:,:,frame_idx]), axs[0].set_title('VIS')
axs[1].imshow(ir069[:,:,frame_idx]), axs[1].set_title('IR 6.9')
axs[2].imshow(ir107[:,:,frame_idx]), axs[2].set_title('IR 10.7')
axs[3].imshow(vil[:,:,frame_idx]), axs[3].set_title('VIL')
axs[4].imshow(lght[:,:,frame_idx]), axs[4].set_title('Lightning')
<matplotlib.image.AxesImage at 0x7f69e00c8470>, Text(0.5, 1.0, 'Lightning'))
示例使用默认的colormap来显示数据。要显示类似于的彩色图像,颜色映射可在位于https://github.com/MIT-AI-Accelerator/eie-sevir的repo中获得。
import sys
sys.path.append('../') # add sevir module to path
from sevir.display import get_cmap
# Get colormaps for encoded types
vis_cmap,vis_norm,vis_vmin,vis_vmax = get_cmap('vis',encoded=True)
ir069_cmap,ir069_norm,ir069_vmin,ir069_vmax = get_cmap('ir069',encoded=True)
ir107_cmap,ir107_norm,ir107_vmin,ir107_vmax = get_cmap('ir107',encoded=True)
vil_cmap,vil_norm,vil_vmin,vil_vmax = get_cmap('vil',encoded=True)
lght_cmap,lght_norm,lght_vmin,lght_vmax = get_cmap('lght',encoded=True)
fig,axs = plt.subplots(1,5,figsize=(14,5))
frame_idx = 30
axs[0].imshow(vis[:,:,frame_idx],cmap=vis_cmap,norm=vis_norm,vmin=vis_vmin,vmax=vis_vmax), axs[0].set_title('VIS')
axs[1].imshow(ir069[:,:,frame_idx],cmap=ir069_cmap,norm=ir069_norm,vmin=ir069_vmin,vmax=ir069_vmax), axs[1].set_title('IR 6.9')
axs[2].imshow(ir107[:,:,frame_idx],cmap=ir107_cmap,norm=ir107_norm,vmin=ir107_vmin,vmax=ir107_vmax), axs[2].set_title('IR 10.7')
axs[3].imshow(vil[:,:,frame_idx],cmap=vil_cmap,norm=vil_norm,vmin=vil_vmin,vmax=vil_vmax), axs[3].set_title('VIL')
axs[4].imshow(lght[:,:,frame_idx],cmap=lght_cmap,norm=lght_norm,vmin=lght_vmin,vmax=lght_vmax), axs[4].set_title('Lightning')
matplotlib.image.AxesImage at 0x7f6a12427208>, Text(0.5, 1.0, 'Lightning'))
运用SEVIR事件
在许多应用程序中,有必要知道SEVIR补丁中像素的纬度/经度位置。目录提供了执行数据地理引用所需的信息。
目前,SEVIR中的所有补丁都使用相同的地图投影,一个以(纬度,lon)=(38,-98)为中心的兰伯特等面积投影。这个投影细节存储在proj列中:
sample_event.proj.values[0]
import re
import numpy as np
class LaeaProjection():
"""
Translates between pixel coordinates (X,Y) and Lat lon coordinates for a given SEVIR event
projected in laea.
Parameters
----------
event pd.Series
One row from SEVIR catalog
"""
def __init__(self,event):
self.proj = event.proj
self.lat0 = float(re.compile('\+lat_0=([+-]?\d+)').search(self.proj).groups()[0])
self.lon0 = float(re.compile('\+lon_0=([+-]?\d+)').search(self.proj).groups()[0])
self.R = float(re.compile('\+a=(\d+)').search(self.proj).groups()[0])
self.llcrnlat = event.llcrnrlat
self.llcrnlon = event.llcrnrlon
self.refX, self.refY = self.forward(self.llcrnlon,self.llcrnlat,pixel=False)
self.binX = event.width_m / event.size_x
self.binY = event.height_m / event.size_y
def forward(self,lon,lat,pixel=True):
"""
Maps lat/lon to pixel x,y. For projection coordinates instead of pixel, set pixel=False.
"""
sind = lambda t: np.sin(t*np.pi/180)
cosd = lambda t: np.cos(t*np.pi/180)
k = self.R * np.sqrt(2/(1+sind(self.lat0)*sind(lat)+cosd(self.lat0)*cosd(lat)*cosd(lon-self.lon0)))
x = k*cosd(lat)*sind(lon-self.lon0)
y = k*(cosd(self.lat0)*sind(lat) - sind(self.lat0)*cosd(lat)*cosd(lon-self.lon0))
if pixel:
x = (x-self.refX) / self.binX
y = (y-self.refY) / self.binY
return x,y
def inverse(self,x,y,pixel=True):
"""
Maps pixel coordinates to (lon,lat) position. If passing projection corrdinates, set pixel=False.
"""
if pixel:
x = x*self.binX + self.refX
y = y*self.binY + self.refY
x/=self.R
y/=self.R
sind = lambda t: np.sin(t*np.pi/180)
cosd = lambda t: np.cos(t*np.pi/180)
rho = np.sqrt(x*x+y*y)
c = 2*np.arcsin(0.5*rho)
sinc = np.sin(c)
cosc = np.cos(c)
lat = 180/np.pi*np.arcsin(cosc*sind(self.lat0)+y*sinc*cosd(self.lat0)/rho)
lon = self.lon0+180/np.pi*np.arctan(x*sinc/(rho*cosd(self.lat0)*cosc - y*sind(self.lat0)*sinc))
return lon,lat
运行一个事件
proj = LaeaProjection( sample_event[sample_event.img_type=='vil'].squeeze() )
X,Y = np.meshgrid(np.arange(vil.shape[0]),np.arange(vil.shape[0]))
lons,lats = proj.inverse(X,Y)
# Plot with origin='lower' so up corresponds to north.
fig,ax=plt.subplots(1,3,figsize=(15,5))
ax[0].imshow(vil[:,:,20],origin='lower')
la = ax[1].imshow(lats,origin='lower')
fig.colorbar(la,ax=ax[1])
ax[1].set_title('Pixel Latitudes')
lo = ax[2].imshow(lons,origin='lower')
fig.colorbar(lo,ax=ax[2])
ax[2].set_title('Pixel Longitudes')
Text(0.5, 1.0, 'Pixel Longitudes')
This particular domain overlaps with north eastern Minnesota. We can add markers for particular locations by converting from lat/lon coordinates into the image pixel coordinates:
In [18]:
lat,lon = 46.7867, -92.1005 # Duluth, MN x,y=proj.forward(lon,lat) print('x=%f,y=%f' % (x,y)) # Plot with origin='lower' so up corresponds to north. fig,ax=plt.subplots(1,1,figsize=(5,5)) ax.imshow(vil[:,:,20],origin='lower') ax.plot(x,y,linestyle='none', marker="o", markersize=16, alpha=0.6, c="red") ax.text(x-30,y-30,'Duluth, MN',color='r')
x=200.021527,y=79.057562
Out[18]:
Text(170.02152736199426, 49.057561772525304, 'Duluth, MN')
Use Basemap instead
In [22]:
# Note: Requires basemap module to run import warnings warnings.filterwarnings('ignore') from mpl_toolkits.basemap import Basemap # conda/pip install basemap basemap-data-hires # Create Basemap Object s=sample_event[sample_event.img_type=='vil'].squeeze() fig,ax=plt.subplots(1,1,figsize=(7,7)) m = Basemap(llcrnrlat=s.llcrnrlat, llcrnrlon=s.llcrnrlon, urcrnrlat=s.urcrnrlat,urcrnrlon=s.urcrnrlon, width=s.width_m, height=s.height_m, lat_0=38, lon_0=-98, projection='laea', resolution='i', ax=ax) m.drawstates(color='w') m.drawcoastlines(color='w') m.drawcountries(color='w') m.imshow(vil[:,:,24]) lat,lon = 46.7867, -92.1005 # Duluth, MN x,y=m(lon,lat) # will be in projection coordinates m.plot(x,y,linestyle='none', marker="o", markersize=16, alpha=0.6, c="red") ax.text(x-30e3,y-30e3,'Duluth, MN',color='r')
Out[22]:
Text(170021.5273619945, 49057.56177252438, 'Duluth, MN')
- NOAA's Storm Event Database: https://www.ncdc.noaa.gov/stormevents/ftp.jsp