190402-通过例子理解LSTM中的数据格式

LSTM Keras API predicting multiple outputsLSTM Keras API predicting multiple outputs

在回归问题中的Keras-LSTM 在监督学习中的样本需要是以下3D格式:

reshape input to be 3D [samples, timesteps, features]

在这里插入图片描述

  1. 准备数据

对于下面数据,样本个数m=10,每个样本的维度n=2,变量名称为var1var2

data = a  = np.linspace(1,20,num=20).reshape((10,2))
[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]
 [ 9. 10.]
 [11. 12.]
 [13. 14.]
 [15. 16.]
 [17. 18.]
 [19. 20.]]
  1. 数据归一化
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
scaled = scaler.fit_transform(data)

在这里插入图片描述

  1. 将数据转换成可用于监督学习的数据格式:通过前3个时间步,预测后2个时间步。

LSTM的输入为n_in=3, 即3个时间步t-3t-2t-1 的特征长度, 则每个训练样本的特征trainX的长度是n_in x n;
LSTM的输出为 n_out=2,即2个时间步t-3t-2t-1 的特征长度,则每个样本的标签trainY的长度是n_out x n

  • 不进行归一化的数据预处理
    在这里插入图片描述
  • 进行归一化的数据预处理
    在这里插入图片描述
n_steps = 3
n_features = 2

reformed = series_to_supervised(data,n_steps,2) 	# without MinMaxScaler
reformed = series_to_supervised(scaled,n_steps,2) 	# with MinMaxScaler
print(reformed.shape) # m: L - (n_in + n_out) + 1, n = (n_in + n_out)*n_features
reformed = reformed.values

train = reformed[::2]  # Get all the odd lines for training
test = reformed[1::2]  # Get all the even lines for testing


train_X = train[:, :6]
train_Y = train[:,6:]
test_X = test[:,:6]
test_Y = test[:,6:]


train_X = train_X.reshape((train_X.shape[0], n_steps, n_features))
test_X = test_X.reshape((test_X.shape[0], n_steps, n_features))
print('The training data shape is ', train_X.shape)
print('The testing data shape is ', test_X.shape)
  1. 程序汇总

import pandas as pd
import numpy as np
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import MinMaxScaler


def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg


data = np.linspace(1,20,num=20).reshape((10,2))
scaler = MinMaxScaler(feature_range=(0,1))

scaled = scaler.fit_transform(data)

n_steps = 3
n_features = 2

reformed = series_to_supervised(data,n_steps,2)
print(data)
print(reformed.shape) # m: L - (n_in + n_out) + 1, n = (n_in + n_out)*n_features

reformed = series_to_supervised(data,n_steps,2) 	# without MinMaxScaler
reformed = series_to_supervised(scaled,n_steps,2) 	# with MinMaxScaler
reformed = reformed.values

train = reformed[::2]  # Get all the odd lines
test = reformed[1::2]  # Get all the even lines


train_X = train[:, :6]
train_Y = train[:,6:]
test_X = test[:,:6]
test_Y = test[:,6:]


train_X = train_X.reshape((train_X.shape[0], n_steps, n_features))
test_X = test_X.reshape((test_X.shape[0], n_steps, n_features))
print('The training data shape is ', train_X.shape)
print('The testing data shape is ', test_X.shape)


  1. 程序输出
[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]
 [ 9. 10.]
 [11. 12.]
 [13. 14.]
 [15. 16.]
 [17. 18.]
 [19. 20.]]
(6, 10)
The training data shape is  (3, 3, 2)
The testing data shape is  (3, 3, 2)
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

GuokLiu

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值