代码来源
https://github.com/nok-halfspace/Transformer-Time-Series-Forecasting
文章信息:https://medium.com/mlearning-ai/transformer-implementation-for-time-series-forecasting-a9db2db5c820
数据结构
该项目中的数据结构如下图所示:有不同的sensor_id, 然后这些sensor在不同的时间段有不同的humidity.
数据导入和初步处理
首先是对数据进行初步处理,以下为DataLoader的代码:
class SensorDataset(Dataset):
"""Face Landmarks dataset."""
def __init__(self, csv_name, root_dir, training_length, forecast_window):
"""
Args:
csv_file (string): Path to the csv file.
root_dir (string): Directory
"""
# load raw data file
csv_file = os.path.join(root_dir, csv_name)
self.df = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = MinMaxScaler() #对数据进行归一化处理
self.T = training_length
self.S = forecast_window
def __len__(self):
# return number of sensors
return len(self.df.groupby(by=["reindexed_id"]))
# Will pull an index between 0 and __len__.
def __getitem__(self, idx):
# Sensors are indexed from 1
idx = idx+1
# np.random.seed(0)
start = np.random.randint(0, len(self.df[self.df["reindexed_id"]==idx]) - self.T - self.S)
sensor_number = str(self.df[self.df["reindexed_id"]==idx][["sensor_id"]][start:start+1].values.item())
index_in = torch.tensor([i for i in range(start, start+self.T)])
index_tar = torch.tensor([i for i in range(start + self.T, start + self.T + self.S)])
_input = torch.tensor(self.df[self.df["reindexed_id"]==idx][["humidity", "sin_hour", "cos_hour", "si