Pytorch—时序数据的加载与简单处理
数据集介绍
使用的数据集为美国华盛顿特区共享汽车交易数据, 可从链接下载。
Load data
# -*- coding: utf-8 -*-
import os
import numpy as np
import torch as th
'''
masterqkk, 20210420
'''
data_path = '../Dataset'
data_name = 'hour.csv'
data_np = np.loadtxt(os.path.join(data_path, data_name), delimiter=',', skiprows=1, converters={1: lambda x: float(x.decode().split('/')[-1])}, dtype=np.float32)
data_t = th.from_numpy(data_np)
print('data_t: {}, {}, {}'.format(data_t, data_t.shape, data_t.stride()))
encoding 9-th col (weather) using one-hot.
first_day = data_t[:24, :].long()
weather_onehot = th.zeros(size=(first_day.shape[0], 4))
weather_onehot.scatter_(dim=1,
index=first_day[:, 9].unsqueeze(1) - 1,
value=1.0
)
print('weather_onehot: {}, {}'.format(weather_onehot, weather_onehot.shape))
first_day_onehot = th.cat((first_day, weather_onehot), 1)[:1, :]
print('first_day_onehot: {}, {}'.format(first_day_onehot, first_day_onehot.shape))
hour_day = 24
daily_data = data_t.view(-1, hour_day, data_t.shape[1]) # (N, L, C)
print('daily_data: {}, {}'.format(daily_data.shape, daily_data.stride()))
daily_data = daily_data.transpose(1, 2) # (N, C, L)
print('daily_data: {}, {}'.format(daily_data.shape, daily_data.stride()))
weather_onehot2 = th.zeros(size=(daily_data.shape[0], 4, daily_data.shape[2]))
weather_onehot2.scatter_(dim=1,
index=daily_data[:, 9, :].long().unsqueeze(1) - 1,
value=1.0
)
daily_data_onehot = th.cat((daily_data, weather_onehot2), dim=1)
print('daily_data_onehot: {}, {}'.format(daily_data_onehot, daily_data_onehot.shape))
scaling (minmax, zscore)
tmp = daily_data[:, 10, :].clone()
tmp = (tmp - th.min(tmp)) / (th.max(tmp) -th.min(tmp))
print('minmax scaling: {}'.format(tmp))
tmp2 = daily_data[:, 10, :].clone()
tmp2 = (tmp2 - th.mean(tmp2)) / th.std(tmp2)
print('zscore scaling: {}'.format(tmp2))