StemGNN

StemGNN


一、项目背景和概念介绍

动机:

对于每一支股票:用t时刻收盘价预测t+1时刻的收盘价
这些所有的股票价格序列之间是否存在某种联系?

首先得到一个先验矩阵(dependency graph),他们认为事先计算出的东西可以用另一种方式代替,在神经网络可以自动学出这种关系。
时间和空间维度能否转到谱的维度上,进行卷积计算?

GFT是将空间输入转到谱域
DFT单变量的时间序列转移到谱域(为什么会出现单变量?因为图经过GFT就变成了一个值)

二、应用场景

谱域GCN:时间+空域——>傅里叶变换——>谱域
多变量依赖问题

2.1 问题定义

给了一系列的时间预测数据X
X = {xit} 是属于RN*T是一个时间序列的
T是时间戳长度,N是节点或者特征的个数(认为节点更方便)

通过一个方法来自动生成N个节点的相互关系——>W:维度是N*N的(adjacency matrix)

工作内容是,利用t-K–>t-1的序列,预测t -->t+H-1的序列
在这里插入图片描述

三、前提知识

自注意力机制

来自于NLP中的Transformer

在这里插入图片描述

四、 项目概念

4.1 关于数据

训练/测试数据 ECG_data 140列(属性),n行代表的是n个时间戳
模型输入
数据集预处理

4.2 模型架构

在这里插入图片描述
Input可以是多个属性的时间序列
Laten Correlation Layer中可以自己学习出一个权重矩阵W,通过W其实就构成了一个图,(矩阵中的每一个值表示两点之间连边的权重)
StemGNNBlock1展开为上面的一张图, 先经过GFT和DFT(时域加空域转为谱域),谱域数据通过GConv和IGFT最后产生了预测值
GFT就是图傅里叶变换
IGFT就是逆傅里叶变换
可以看到在两个StemGNNBlock1之间,StemGNNBlock2接受的X实际上是X-X‘的输入,类似于残差的效果
最后的输出为X1+X2和Y1+Y2

4.3 自定义参数

windowsize:用t时刻前多少(int)个时间戳作为输入数据
horizon:未来预测多少天(int)的数据

五、 代码部分

总括

在这里插入图片描述
这是整个项目的结构,可以通过文件夹的名字了解到各个部分的功能。

1、main.py(控制调用顺序和主体框架)

main函数的功能是规定参数、数据集处理、创建和训练模型
第一部分如下,就是初始化参数

parser = argparse.ArgumentParser()
parser.add_argument('--train', type=bool, default=True)
parser.add_argument('--evaluate', type=bool, default=True)
parser.add_argument('--dataset', type=str, default='ECG_data')
parser.add_argument('--window_size', type=int, default=12)#  用t时刻前多少(int)个时间戳作为输入数据
parser.add_argument('--horizon', type=int, default=3)#  未来预测多少天(int)的数据
parser.add_argument('--train_length', type=float, default=7)
parser.add_argument('--valid_length', type=float, default=2)
parser.add_argument('--test_length', type=float, default=1)
parser.add_argument('--epoch', type=int, default=50)
parser.add_argument('--lr', type=float, default=1e-4)
parser.add_argument('--multi_layer', type=int, default=5)
parser.add_argument('--device', type=str, default='cpu')
parser.add_argument('--validate_freq', type=int, default=1)
parser.add_argument('--batch_size', type=int, default=32)
parser.add_argument('--norm_method', type=str, default='z_score')
parser.add_argument('--optimizer', type=str, default='RMSProp')
parser.add_argument('--early_stop', type=bool, default=False)
parser.add_argument('--exponential_decay_step', type=int, default=5)
parser.add_argument('--decay_rate', type=float, default=0.5)
parser.add_argument('--dropout_rate', type=float, default=0.5)
parser.add_argument('--leakyrelu_rate', type=int, default=0.2)
args = parser.parse_args()
print(f'Training configs: {args}')

接下来是读入数据

data_file = os.path.join('dataset', args.dataset + '.csv')
result_train_file = os.path.join('output', args.dataset, 'train')
result_test_file = os.path.join('output', args.dataset, 'test')
if not os.path.exists(result_train_file):
    os.makedirs(result_train_file)
if not os.path.exists(result_test_file):
    os.makedirs(result_test_file)
data = pd.read_csv(data_file).values

分割数据,产生训练数据、验证数据和测试数据

# split data
train_ratio = args.train_length / (args.train_length + args.valid_length + args.test_length)
valid_ratio = args.valid_length / (args.train_length + args.valid_length + args.test_length)
test_ratio = 1 - train_ratio - valid_ratio
train_data = data[:int(train_ratio * len(data))]
valid_data = data[int(train_ratio * len(data)):int((train_ratio + valid_ratio) * len(data))]
test_data = data[int((train_ratio + valid_ratio) * len(data)):]

torch.manual_seed(0)

最后一部分是执行训练和测试

if __name__ == '__main__':
    if args.train:
        try:
            before_train = datetime.now().timestamp()
            _, normalize_statistic = train(train_data, valid_data, args, result_train_file) # 开始训练
            after_train = datetime.now().timestamp()
            print(f'Training took {(after_train - before_train) / 60} minutes')
        except KeyboardInterrupt:
            print('-' * 99)
            print('Exiting from training early')
    if args.evaluate:
        before_evaluation = datetime.now().timestamp()
        test(test_data, args, result_train_file, result_test_file)
        after_evaluation = datetime.now().timestamp()
        print(f'Evaluation took {(after_evaluation - before_evaluation) / 60} minutes')
    print('done')

2、handler.py(训练、测试、验证等关键环节)

包含以下6个函数:

(1)训练函数

这是我们在main.py中最先会调用的部分

def train(train_data, valid_data, args, result_file):

首先构建模型

node_cnt = train_data.shape[1] # 节点数或属性(140)
model = Model(node_cnt, 2, args.window_size, args.multi_layer, horizon=args.horizon)
model.to(args.device)
if len(train_data) == 0:
    raise Exception('Cannot organize enough training data')
if len(valid_data) == 0:
    raise Exception('Cannot organize enough validation data')

归一化并存储

# 下面是归一化步骤,对每一列的属性时间序列计算均值和方差,并保存到norm_stat.json中
if args.norm_method == 'z_score':
    train_mean = np.mean(train_data, axis=0)
    train_std = np.std(train_data, axis=0)
    normalize_statistic = {"mean": train_mean.tolist(), "std": train_std.tolist()}
elif args.norm_method == 'min_max':
    train_min = np.min(train_data, axis=0)
    train_max = np.max(train_data, axis=0)
    normalize_statistic = {"min": train_min.tolist(), "max": train_max.tolist()}
else:
    normalize_statistic = None
if normalize_statistic is not None:
    with open(os.path.join(result_file, 'norm_stat.json'), 'w') as f:
        json.dump(normalize_statistic, f)

定义优化器

if args.optimizer == 'RMSProp':
        my_optim = torch.optim.RMSprop(params=model.parameters(), lr=args.lr, eps=1e-08)
    else:
        my_optim = torch.optim.Adam(params=model.parameters(), lr=args.lr, betas=(0.9, 0.999))
    my_lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=my_optim, gamma=args.decay_rate)

利用ForecastDataset对数据进行实质的划分
关于class ForecastDataset,会在forecast_dataloader.py进行介绍
然后定义loss函数为MSELoss

# 进行数据的划分
train_set = ForecastDataset(train_data, window_size=args.window_size, horizon=args.horizon,
                            normalize_method=args.norm_method, norm_statistic=normalize_statistic)
valid_set = ForecastDataset(valid_data, window_size=args.window_size, horizon=args.horizon,
                            normalize_method=args.norm_method, norm_statistic=normalize_statistic)
train_loader = torch_data.DataLoader(train_set, batch_size=args.batch_size, drop_last=False, shuffle=True,
                                     num_workers=0)
valid_loader = torch_data.DataLoader(valid_set, batch_size=args.batch_size, shuffle=False, num_workers=0)

forecast_loss = nn.MSELoss(reduction='mean').to(args.device)
total_params = 0
for name, parameter in model.named_parameters():
    if not parameter.requires_grad: continue
    param = parameter.numel()
    total_params += param
print(f"Total Trainable Params: {total_params}")

best_validate_mae = np.inf # 存下最小的mae结果
validate_score_non_decrease_count = 0 # 累计多少次验证集得分不降低就终止,防止过拟合
performance_metrics = {}

接下来开始训练,进行epoch次,对于每一批batch对数据进行训练
如果本次训练产生的误差是小于之前的最好的一次训练,则更新矩阵

for epoch in range(args.epoch):
    epoch_start_time = time.time()
    model.train()
    loss_total = 0
    cnt = 0
    for i, (inputs, target) in enumerate(train_loader):
        # inputs.shape = [32,12,140]
        # target.shape = [32,3,140]
        # 其中32是每次batch的大小
        inputs = inputs.to(args.device)
        target = target.to(args.device)
        model.zero_grad() # 不同的batch不要累积梯度
        forecast, _ = model(inputs)
        loss = forecast_loss(forecast, target)
        cnt += 1
        loss.backward()
        my_optim.step()
        loss_total += float(loss)
    print('| end of epoch {:3d} | time: {:5.2f}s | train_total_loss {:5.4f}'.format(epoch, (
            time.time() - epoch_start_time), loss_total / cnt))
    save_model(model, result_file, epoch)
    if (epoch+1) % args.exponential_decay_step == 0:
        my_lr_scheduler.step()
    if (epoch + 1) % args.validate_freq == 0:
        is_best_for_now = False
        print('------ validate on data: VALIDATE ------')
        performance_metrics = \
            validate(model, valid_loader, args.device, args.norm_method, normalize_statistic,
                     node_cnt, args.window_size, args.horizon,
                     result_file=result_file)
        if best_validate_mae > performance_metrics['mae']:
            best_validate_mae = performance_metrics['mae']
            is_best_for_now = True
            validate_score_non_decrease_count = 0
        else:
            validate_score_non_decrease_count += 1
        # save model
        if is_best_for_now:
            save_model(model, result_file)
    # early stop
    if args.early_stop and validate_score_non_decrease_count >= args.early_stop_step:
        break
return performance_metrics, normalize_statistic

(2)验证函数

def validate(model, dataloader, device, normalize_method, statistic,
             node_cnt, window_size, horizon,
             result_file=None):

(3)测试函数

def test(test_data, args, result_train_file, result_test_file):

(4)

def inference(model, dataloader, device, node_cnt, window_size, horizon):

(5)保存模型

def save_model(model, model_dir, epoch=None):

(6)加载模型

def load_model(model_dir, epoch=None):

3、forecast_dataloader.py(数据集处理)

(1)ForecastDataset类

不作详细介绍,__getitem__是根据传入的参数得到划分出的训练集和测试集
get_x_end_idx是得到所有区间开始位置的索引

class ForecastDataset(torch_data.Dataset):
    def __init__(self, df, window_size, horizon, normalize_method=None, norm_statistic=None, interval=1):
        self.window_size = window_size
        self.interval = interval
        self.horizon = horizon
        self.normalize_method = normalize_method
        self.norm_statistic = norm_statistic
        df = pd.DataFrame(df)
        df = df.fillna(method='ffill', limit=len(df)).fillna(method='bfill', limit=len(df)).values
        self.data = df
        self.df_length = len(df)
        self.x_end_idx = self.get_x_end_idx()
        if normalize_method:
            self.data, _ = normalized(self.data, normalize_method, norm_statistic)

    def __getitem__(self, index):
        hi = self.x_end_idx[index]
        lo = hi - self.window_size
        train_data = self.data[lo: hi]
        target_data = self.data[hi:hi + self.horizon]
        x = torch.from_numpy(train_data).type(torch.float)
        y = torch.from_numpy(target_data).type(torch.float)
        return x, y
    def __len__(self):
        return len(self.x_end_idx)

    def get_x_end_idx(self):
        # each element `hi` in `x_index_set` is an upper bound for get training data
        # training data range: [lo, hi), lo = hi - window_size
        # 从window_size开始(因为第一组预测是在window_size的下一个),到length-horizon+1结束,因为最后一组预测的第一个元素的位置就是self.df_length - self.horizon,因为有预测宽度
        x_index_set = range(self.window_size, self.df_length - self.horizon + 1)
        # 这里x_end_idx存储着所有从开始到结束的索引
        x_end_idx = [x_index_set[j * self.interval] for j in range((len(x_index_set)) // self.interval)]
        return x_end_idx

4、base_model.py(模型设计)

在这里插入图片描述
便于理解,将框图再次放到这里

(1)class Model(nn.Module):

初始化部分

class Model(nn.Module):
    def __init__(self, units, stack_cnt, time_step, multi_layer, horizon=1, dropout_rate=0.5, leaky_rate=0.2,
                 device='cpu'):
        super(Model, self).__init__()
        self.unit = units  # 输出的维度,一般也是140,因为要算140*140的矩阵
        self.stack_cnt = stack_cnt  # StemGNNBlock的块数(2)
        self.alpha = leaky_rate
        self.time_step = time_step  # 引用窗口大小
        self.horizon = horizon  # 预测窗口大小
        self.weight_key = nn.Parameter(torch.zeros(size=(self.unit, 1)))  # 自注意力机制的Wk
        nn.init.xavier_uniform_(self.weight_key.data, gain=1.414)
        self.weight_query = nn.Parameter(torch.zeros(size=(self.unit, 1)))  # 自注意力机制的Wq
        nn.init.xavier_uniform_(self.weight_query.data, gain=1.414)
        # 直接调用GRU模型
        self.GRU = nn.GRU(self.time_step, self.unit)
        self.multi_layer = multi_layer
        self.stock_block = nn.ModuleList()
        # 初始化两个StemGNNBlock
        self.stock_block.extend(
            [StockBlockLayer(self.time_step, self.unit, self.multi_layer, stack_cnt=i) for i in range(self.stack_cnt)])
        self.fc = nn.Sequential(
            nn.Linear(int(self.time_step), int(self.time_step)),
            nn.LeakyReLU(),
            nn.Linear(int(self.time_step), self.horizon),
        )
        self.leakyrelu = nn.LeakyReLU(self.alpha)
        self.dropout = nn.Dropout(p=dropout_rate)
        self.to(device)
    def get_laplacian(self, graph, normalize):
        """
        return the laplacian of the graph.
        :param graph: the graph structure without self loop, [N, N].
        :param normalize: whether to used the normalized laplacian.
        :return: graph laplacian.
        """
        if normalize:
            D = torch.diag(torch.sum(graph, dim=-1) ** (-1 / 2))
            L = torch.eye(graph.size(0), device=graph.device, dtype=graph.dtype) - torch.mm(torch.mm(D, graph), D)
        else:
            D = torch.diag(torch.sum(graph, dim=-1))
            L = D - graph
        return L

返回出切比雪夫多项式

    def cheb_polynomial(self, laplacian):
        """
        Compute the Chebyshev Polynomial, according to the graph laplacian.
        :param laplacian: the graph laplacian, [N, N].
        :return: the multi order Chebyshev laplacian, [K, N, N].
        """
        N = laplacian.size(0)  # [N, N]
        laplacian = laplacian.unsqueeze(0)
        first_laplacian = torch.zeros([1, N, N], device=laplacian.device, dtype=torch.float)
        second_laplacian = laplacian
        third_laplacian = (2 * torch.matmul(laplacian, second_laplacian)) - first_laplacian
        forth_laplacian = 2 * torch.matmul(laplacian, third_laplacian) - second_laplacian
        multi_order_laplacian = torch.cat([first_laplacian, second_laplacian, third_laplacian, forth_laplacian], dim=0)
        return multi_order_laplacian

下面这个部分对数据进行处理,然后用GRU进行时序学习,进而通过自注意力机制(self_graph_attention)得到attention

    def latent_correlation_layer(self, x):
    	# 原本[32, 12, 140]的输入变成了[140, 32, 12]的数据
    	# 因为我们要用GRU对时间序列学习,所以最后的维度应该是时间的序列
        input, _ = self.GRU(x.permute(2, 0, 1).contiguous())
        input = input.permute(1, 0, 2).contiguous()
        # 下一步调用本模型函数得到attention
        attention = self.self_graph_attention(input)
        # attention.shape:[32,140,140]
        attention = torch.mean(attention, dim=0)
        # attention.shape:[140,140]
        degree = torch.sum(attention, dim=1)
        # degree.shape:[140]
        # laplacian is sym or not
        # 接下来产生一个对称的attention
        attention = 0.5 * (attention + attention.T)
        degree_l = torch.diag(degree)
        # 返回一个以度的逆为对角线的二维矩阵,
        diagonal_degree_hat = torch.diag(1 / (torch.sqrt(degree) + 1e-7))
        # 对度的逆矩阵开根号
        laplacian = torch.matmul(diagonal_degree_hat,
                                 torch.matmul(degree_l - attention, diagonal_degree_hat))
        # 得到拉普拉斯矩阵
        mul_L = self.cheb_polynomial(laplacian)
        #利用切比雪夫公式得到四阶的拉普拉斯矩阵
        # mul_L的维度为[4, 140, 140]
        # 返回到forward函数中
        return mul_L, attention
    def self_graph_attention(self, input):
        input = input.permute(0, 2, 1).contiguous()
        bat, N, fea = input.size()
        key = torch.matmul(input, self.weight_key)
        # self.weight_key.shape:[140,1]
        # key.shape:[32,140,1]
        query = torch.matmul(input, self.weight_query)
        # repeat就是将每个对应维度长度乘倍数,下面就是将第三个维度乘N
        # key就变成了[32,140,140]的维度
        data = key.repeat(1, 1, N).view(bat, N * N, 1) + query.repeat(1, N, 1)
        # data.shape:[32,140*140,1]
        data = data.squeeze(2)
        data = data.view(bat, N, -1)
        data = self.leakyrelu(data)
        # data.shape:[32,140,140]
        attention = F.softmax(data, dim=2)
        attention = self.dropout(attention)
        # attention.shape:[32,140,140]
        return attention

    def graph_fft(self, input, eigenvectors):
        return torch.matmul(eigenvectors, input)

    def forward(self, x):
        mul_L, attention = self.latent_correlation_layer(x)
        X = x.unsqueeze(1).permute(0, 1, 3, 2).contiguous()
        result = []
      	# 开始进入到两层的StockBlockLayer中,传入X和四阶拉欧拉斯矩阵
        for stack_i in range(self.stack_cnt):
            forecast, X = self.stock_block[stack_i](X, mul_L)
            result.append(forecast)
        forecast = result[0] + result[1]
        forecast = self.fc(forecast)
        if forecast.size()[-1] == 1:
            return forecast.unsqueeze(1).squeeze(-1), attention
        else:
            return forecast.permute(0, 2, 1).contiguous(), attention

(2)class StockBlockLayer(nn.Module):

   def __init__(self, time_step, unit, multi_layer, stack_cnt=0):
        super(StockBlockLayer, self).__init__()
        self.time_step = time_step
        self.unit = unit
        self.stack_cnt = stack_cnt
        self.multi = multi_layer
        self.weight = nn.Parameter(
            torch.Tensor(1, 3 + 1, 1, self.time_step * self.multi,
                         self.multi * self.time_step))  # [K+1, 1, in_c, out_c]
        nn.init.xavier_normal_(self.weight)
        self.forecast = nn.Linear(self.time_step * self.multi, self.time_step * self.multi)
        self.forecast_result = nn.Linear(self.time_step * self.multi, self.time_step)
        if self.stack_cnt == 0:
            self.backcast = nn.Linear(self.time_step * self.multi, self.time_step)
        self.backcast_short_cut = nn.Linear(self.time_step, self.time_step)
        self.relu = nn.ReLU()
        self.GLUs = nn.ModuleList()
        self.output_channel = 4 * self.multi
        for i in range(3):
            if i == 0:
                self.GLUs.append(GLU(self.time_step * 4, self.time_step * self.output_channel))
                self.GLUs.append(GLU(self.time_step * 4, self.time_step * self.output_channel))
            elif i == 1:
                self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))
                self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))
            else:
                self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))
                self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))

    def spe_seq_cell(self, input):
        batch_size, k, input_channel, node_cnt, time_step = input.size()
        input = input.view(batch_size, -1, node_cnt, time_step)
        ffted = torch.rfft(input, 1, onesided=False)
        real = ffted[..., 0].permute(0, 2, 1, 3).contiguous().reshape(batch_size, node_cnt, -1)
        img = ffted[..., 1].permute(0, 2, 1, 3).contiguous().reshape(batch_size, node_cnt, -1)
        for i in range(3):
            real = self.GLUs[i * 2](real)
            img = self.GLUs[2 * i + 1](img)
        real = real.reshape(batch_size, node_cnt, 4, -1).permute(0, 2, 1, 3).contiguous()
        img = img.reshape(batch_size, node_cnt, 4, -1).permute(0, 2, 1, 3).contiguous()
        time_step_as_inner = torch.cat([real.unsqueeze(-1), img.unsqueeze(-1)], dim=-1)
        iffted = torch.irfft(time_step_as_inner, 1, onesided=False)
        return iffted

输入的X和mul_L相乘
x.shape:[32,1,140,12]
mul_L.shape:[4,140,140]
gfted.shape:[32,4,1,140,12]

    def forward(self, x, mul_L):
        mul_L = mul_L.unsqueeze(1)
        # mul_L.shape:[4,1,140,140]
        x = x.unsqueeze(1)
        # x.shape:[32,1,1,140,12]
        gfted = torch.matmul(mul_L, x)
        gconv_input = self.spe_seq_cell(gfted).unsqueeze(2)
        igfted = torch.matmul(gconv_input, self.weight)
        igfted = torch.sum(igfted, dim=1)
        forecast_source = torch.sigmoid(self.forecast(igfted).squeeze(1))
        forecast = self.forecast_result(forecast_source)
        if self.stack_cnt == 0:
            backcast_short = self.backcast_short_cut(x).squeeze(1)
            backcast_source = torch.sigmoid(self.backcast(igfted) - backcast_short)
        else:
            backcast_source = None
        return forecast, backcast_source

(3)class GLU(nn.Module):

门控单元:
作用:1. 序列深度建模 2. 减轻梯度弥散,加速收敛

class GLU(nn.Module):
    def __init__(self, input_channel, output_channel):
        super(GLU, self).__init__()
        self.linear_left = nn.Linear(input_channel, output_channel)
        self.linear_right = nn.Linear(input_channel, output_channel)

    def forward(self, x):
        return torch.mul(self.linear_left(x), torch.sigmoid(self.linear_right(x)))
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值