深度学习对通量的预测模型

最新推荐文章于 2023-10-11 21:11:14 发布

mayubins

最新推荐文章于 2023-10-11 21:11:14 发布

阅读量386

点赞数

分类专栏：深度学习机器学习文章标签： python 开发语言人工智能深度学习

本文链接：https://blog.csdn.net/mayubins/article/details/125203395

版权

机器学习同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

深度学习

1 篇文章 0 订阅

订阅专栏

0.准备工作

哦对了

因为我看了看联合观测资料里的有很多缺测，于是我决定用coare 资料

先加载一些包


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from  scipy.io import loadmat 
# PyTorch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# For data preprocess
import numpy as np
import csv
import os

# For plotting
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

然后

data = np.loadtxt('coare.txt')

数据差不多是这样的

'''
-------------------------------------------------------------------------------
00Date: YYMMDDHHmmss, YY=year, MM=month, DD=day, HH=hour, mm=minute,ss=sec 
1Us:   ship speed (as described above) 
2U:  true wind speed at 15-m height 
3Tru:  true wind direction rel. to N (meteorological convention) 
4Rel:  relative wind direction 
5Hed:  the direction the ship's bow is pointing 
6Ts:   sea surface temp (no cool skin correction) 
7T:    Vaisala air temperature (about 15 m) 
8qs:   sea surface specific humidity  (g/kg) (no cool skin correction) 
9q:    Vaisala air specific humidity (about 15 m) 
-------------------------------------------------------------------------------
10Hsc:  covariance sensible heat flux 
11Hsi:  inertial sensible heat flux 
12Hsb:  bulk sensible heat flux 
13Hlc:  covariance latent heat flux 
14Hli:  inertial latent heat flux 
15Hlb:  bulk latent heat flux 
16Tuc:  covariance surface stress (-wu part only) 
17Tui:  inertial-dissipation surface stress 
18Tub:  bulk surface stress 
-------------------------------------------------------------------------------
19Rs:   solar irradiance 
20Rl:   longwave irradiance 
21Rain: precipitation (mm/hr) 
22J:    ship plume/contamination index (0 implies good conditions) 
23Oph:  standard deviation of OPHIR hygrometer clear channel counts (<15 implies 
      reasonably clean optics). 
24Tlt:  mean wind vector tilt, degrees (<10 ok covariances) 
25Jm:   ship maneuver/contamination index, m/s (<2 implies good conditions) 
-------------------------------------------------------------------------------
26Ct:   sonic temperature structure function parameter (K^2/m^.667) 
27Cq:   water vapor structure function parameter ((g/m^3)/m^.667) 
28Cu:   streamwise velocity structure function parameter ((m/s)^2/m^.667) 
29Cw:   vertical velocity structure function parameter ((m/s)^2/m^.667) 
30Hr:   sensible heat flux due to precipitation at droplet wet-bulb T 
31To:   OPHIR air temperature 
32Qo:   OPHIR specific humidity 
-------------------------------------------------------------------------------
33Lat:  Latitude 
34Lon:  Longitude
-------------------------------------------------------------------------------
这是35列每列的变量。
'''

因为后面还需要参考，我就放进代码里

接下来就是选择用哪些来回归了

还有就是整理好数据，为了后面的训练数据用

得准备训练集，和测试集

验证集会在后面从训练集里面分出来的

feature = [2,6,8,9,19,21,33]
feature_for_train = [2,6,8,9,19,21,33,13]
feature_count = np.size(feature)
#feature = list(range(1,10))
#feature.extend([19,20,21,22,23,24,25,33,34])
#feature_for_train = list(range(1,10))
#feature_for_train.extend([19,20,21,22,23,24,25,33,34,13])
data_train1 = data[0:4000,:]
data_test1 = data[4000:4806,:]
hlc = data[:,13]
hlb = data[:,15]

data_train = data_train1[:,feature_for_train]
data_test = data_test1[:,feature]
data_train = pd.DataFrame(data_train)
data_test =pd.DataFrame(data_test)
data_train.to_csv('train.csv',index=False)
data_test.to_csv('test.csv',index=False)
#还是保存成csv吧。就先回归潜热吧

因为我比较喜欢研究潜热，主要是其他两个别人搞得已经非常好了，没啥搞头了

但是这个我后面还是会进行模拟的

只是潜热我更喜欢它

接下来就是调试深度学习的代码了，因为之前学的课，我有半成品的代码，我就不要从头开始敲了，这样方便一点了。

接下来我就放几个关键的地方

我准备写：

1.因子的选择

2.网络的结构

3.优化的方法

4.结果的展示

这也是深度学习里面最为关键的几个点，最核心的点吧

1.因子的选择

首先是选择参数

我们得抓住重点，最大的错误就是对于无关紧要的变量进行细致计算却丢掉了重要的参数。这差不多是朗道说的。

feature = [2,6,8,9,19,21,33]
feature_for_train = [2,6,8,9,19,21,33,13]

feature_count = np.size(feature)

class myDataset(Dataset):

    def __init__(self,
                 path,
                 mode='train',
                 target_only=False):
        self.mode = mode

        # Read data into numpy arrays
        with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1:])[:, :].astype(float)
        
        if not target_only:
            feats = list(range(feature_count))
        else:
           feats = list(range(feature_count))# feats = list(range(40))
           # feats.extend([57,75])# TODO

        if mode == 'test':
            # Testing data    
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
        else:
            # Training data (train/dev sets)

            target = data[:, -1]
            data = data[:, feats]
            
            # Splitting training data into train & dev sets
            if mode == 'train':
                indices = [i for i in range(len(data)) if i % 10 != 0]
            elif mode == 'dev':
                indices = [i for i in range(len(data)) if i % 10 == 0]
            
            # Convert data into PyTorch tensors
            self.data = torch.FloatTensor(data[indices])
            self.target = torch.FloatTensor(target[indices])

        # Normalize features (you may remove this part to see what will happen)
        self.data[:, :] =  (self.data[:, :] - self.data[:, :].mean(dim=0, keepdim=True))   / self.data[:, :].std(dim=0, keepdim=True)

        self.dim = self.data.shape[1]

        print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
              .format(mode, len(self.data), self.dim))

    def __getitem__(self, index):
        # Returns one sample at a time
        if self.mode in ['train', 'dev']:
            # For training
            return self.data[index], self.target[index]
        else:
            # For testing (no target)
            return self.data[index]

    def __len__(self):
        # Returns the size of the dataset
        return len(self.data)

这里面写了一个class来搞他

就是其实选择因子最主要还是前面的那两行，后面几乎不用动了。

2.网络的结构

这个网络的结构也就是模型的核心了，之后的调参都是对于这个框架的细节进行修改，网络的结构非常重要了。

但是具体每个问题需要什么结构还是需要对症下药的，我也讲不清楚。

在这个问题里面

我是这样的，但是我还在调试。

class NeuralNet(nn.Module):
    ''' A simple fully-connected deep neural network '''
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()

        # Define your neural network here
        # TODO: How to modify this model to achieve better performance?
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )

        # Mean squared error loss
        self.criterion = nn.MSELoss(reduction='mean')

    def forward(self, x):
        ''' Given input of size (batch_size x input_dim), compute output of the network '''
        return self.net(x).squeeze(1)

    def cal_loss(self, pred, target):
        ''' Calculate loss '''
        # TODO: you may implement L2 regularization here
        return self.criterion(pred, target)

具体的还需要多学习，看文献了。

3.优化的方法

优化也有很多方法，我一下就能想到的知道就是SGD和Adam

这两我都试了

config = {
    'n_epochs': 3000,                # maximum number of epochs
    'batch_size': 500,               # mini-batch size for dataloader
    'optimizer': 'Adam',              # optimization algorithm (optimizer in torch.optim)
    'optim_hparas': {                # hyper-parameters for the optimizer (depends on which optimizer you are using)
#        'lr': 0.001,                 # learning rate of SGD
#        'momentum': 0.09              # momentum for SGD
    },
    'early_stop': 10000,               # early stopping epochs (the number epochs since your model's last improvement)

因为我现在的更大的问题出现在model bias

所以优化我觉得这个就可以了。