笔记：ML-LHY-HW1_regression

最新推荐文章于 2022-10-18 18:47:04 发布

snoopy_21

最新推荐文章于 2022-10-18 18:47:04 发布

阅读量821

点赞数

分类专栏：机器学习笔记李宏毅ML课程笔记

本文链接：https://blog.csdn.net/qq_29598161/article/details/107351058

版权

笔记同时被 3 个专栏收录

68 篇文章 7 订阅

订阅专栏

机器学习

32 篇文章 0 订阅

订阅专栏

李宏毅ML课程笔记

31 篇文章 4 订阅

订阅专栏

因为测试集没有第10小时，又不打算在Kaggle上验证，所以修改成根据8小时预测第9个小时的PM2.5m，部分注释还是通过9预测第10 小时

本次目標：由前 9 個小時的 18 個 features (包含 PM2.5)預測的 10 個小時的 PM2.5。

Load 'train.csv’

train.csv 的資料為 12 個月中，每個月取 20 天，每天 24 小時的資料(每小時資料有 18 個 features)。

cd /data/jupyter/root/MachineLearning/Lee/wk1/

/data/jupyter/root/MachineLearning/Lee/wk1

import sys
import pandas as pd
import numpy as np
# from google.colab import drive 
# !gdown --id '1wNKAxQ29G15kgpBy_asjTcZRRgmsCZRm' --output data.zip
# !unzip data.zip
data = pd.read_csv('./data/train.csv', header = 0, encoding = 'big5')
# data = pd.read_csv('./train.csv', encoding = 'big5')

print(len(data))
data[:22]

	日期	測站	測項	0	1	2	3	4	5	6	...	14	15	16	17	18	19	20	21	22	23
0	2014/1/1	豐原	AMB_TEMP	14	14	14	13	12	12	12	...	22	22	21	19	17	16	15	15	15	15
1	2014/1/1	豐原	CH4	1.8	1.8	1.8	1.8	1.8	1.8	1.8	...	1.8	1.8	1.8	1.8	1.8	1.8	1.8	1.8	1.8	1.8
2	2014/1/1	豐原	CO	0.51	0.41	0.39	0.37	0.35	0.3	0.37	...	0.37	0.37	0.47	0.69	0.56	0.45	0.38	0.35	0.36	0.32
3	2014/1/1	豐原	NMHC	0.2	0.15	0.13	0.12	0.11	0.06	0.1	...	0.1	0.13	0.14	0.23	0.18	0.12	0.1	0.09	0.1	0.08
4	2014/1/1	豐原	NO	0.9	0.6	0.5	1.7	1.8	1.5	1.9	...	2.5	2.2	2.5	2.3	2.1	1.9	1.5	1.6	1.8	1.5
5	2014/1/1	豐原	NO2	16	9.2	8.2	6.9	6.8	3.8	6.9	...	11	11	22	28	19	12	8.1	7	6.9	6
6	2014/1/1	豐原	NOx	17	9.8	8.7	8.6	8.5	5.3	8.8	...	14	13	25	30	21	13	9.7	8.6	8.7	7.5
7	2014/1/1	豐原	O3	16	30	27	23	24	28	24	...	65	64	51	34	33	34	37	38	38	36
8	2014/1/1	豐原	PM10	56	50	48	35	25	12	4	...	52	51	66	85	85	63	46	36	42	42
9	2014/1/1	豐原	PM2.5	26	39	36	35	31	28	25	...	36	45	42	49	45	44	41	30	24	13
10	2014/1/1	豐原	RAINFALL	NR	NR	NR	NR	NR	NR	NR	...	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR
11	2014/1/1	豐原	RH	77	68	67	74	72	73	74	...	47	49	56	67	72	69	70	70	70	69
12	2014/1/1	豐原	SO2	1.8	2	1.7	1.6	1.9	1.4	1.5	...	3.9	4.4	9.9	5.1	3.4	2.3	2	1.9	1.9	1.9
13	2014/1/1	豐原	THC	2	2	2	1.9	1.9	1.8	1.9	...	1.9	1.9	1.9	2.1	2	1.9	1.9	1.9	1.9	1.9
14	2014/1/1	豐原	WD_HR	37	80	57	76	110	106	101	...	307	304	307	124	118	121	113	112	106	110
15	2014/1/1	豐原	WIND_DIREC	35	79	2.4	55	94	116	106	...	313	305	291	124	119	118	114	108	102	111
16	2014/1/1	豐原	WIND_SPEED	1.4	1.8	1	0.6	1.7	2.5	2.5	...	2.5	2.2	1.4	2.2	2.8	3	2.6	2.7	2.1	2.1
17	2014/1/1	豐原	WS_HR	0.5	0.9	0.6	0.3	0.6	1.9	2	...	2.1	2.1	1.9	1	2.5	2.5	2.8	2.6	2.4	2.3
18	2014/1/2	豐原	AMB_TEMP	16	15	15	14	14	15	16	...	24	24	23	21	20	19	18	18	18	18
19	2014/1/2	豐原	CH4	1.8	1.8	1.8	1.8	1.8	1.8	1.8	...	1.8	1.8	1.8	1.8	1.8	1.8	1.8	1.8	1.8	1.8
20	2014/1/2	豐原	CO	0.26	0.25	0.28	0.27	0.24	0.26	0.34	...	0.34	0.35	0.38	0.61	0.44	0.4	0.4	0.55	0.41	0.33
21	2014/1/2	豐原	NMHC	0.06	0.05	0.06	0.05	0.05	0.07	0.09	...	0.12	0.16	0.23	0.32	0.18	0.15	0.23	0.29	0.17	0.12

22 rows × 27 columns

Preprocessing

取需要的數值部分，將 ‘RAINFALL’ 欄位全部補 0。
另外，如果要在 colab 重覆這段程式碼的執行，請從頭開始執行(把上面的都重新跑一次)，以避免跑出不是自己要的結果（若自己寫程式不會遇到，但 colab 重複跑這段會一直往下取資料。意即第一次取原本資料的第三欄之後的資料，第二次取第一次取的資料掉三欄之後的資料，…）。

data = data.iloc[:, 3:]
data[data == 'NR'] = 0
raw_data = data.to_numpy()
raw_data[0 : 18, :]# 第一天的18个features

array([['14', '14', '14', '13', '12', '12', '12', '12', '15', '17', '20',
        '22', '22', '22', '22', '22', '21', '19', '17', '16', '15', '15',
        '15', '15'],
       ['1.8', '1.8', '1.8', '1.8', '1.8', '1.8', '1.8', '1.8', '1.8',
        '1.8', '1.8', '1.8', '1.8', '1.8', '1.8', '1.8', '1.8', '1.8',
        '1.8', '1.8', '1.8', '1.8', '1.8', '1.8'],
       ['0.51', '0.41', '0.39', '0.37', '0.35', '0.3', '0.37', '0.47',
        '0.78', '0.74', '0.59', '0.52', '0.41', '0.4', '0.37', '0.37',
        '0.47', '0.69', '0.56', '0.45', '0.38', '0.35', '0.36', '0.32'],
       ['0.2', '0.15', '0.13', '0.12', '0.11', '0.06', '0.1', '0.13',
        '0.26', '0.23', '0.2', '0.18', '0.12', '0.11', '0.1', '0.13',
        '0.14', '0.23', '0.18', '0.12', '0.1', '0.09', '0.1', '0.08'],
       ['0.9', '0.6', '0.5', '1.7', '1.8', '1.5', '1.9', '2.2', '6.6',
        '7.9', '4.2', '2.9', '3.4', '3', '2.5', '2.2', '2.5', '2.3',
        '2.1', '1.9', '1.5', '1.6', '1.8', '1.5'],
       ['16', '9.2', '8.2', '6.9', '6.8', '3.8', '6.9', '7.8', '15',
        '21', '14', '11', '14', '12', '11', '11', '22', '28', '19', '12',
        '8.1', '7', '6.9', '6'],
       ['17', '9.8', '8.7', '8.6', '8.5', '5.3', '8.8', '9.9', '22',
        '29', '18', '14', '17', '15', '14', '13', '25', '30', '21', '13',
        '9.7', '8.6', '8.7', '7.5'],
       ['16', '30', '27', '23', '24', '28', '24', '22', '21', '29', '44',
        '58', '50', '57', '65', '64', '51', '34', '33', '34', '37', '38',
        '38', '36'],
       ['56', '50', '48', '35', '25', '12', '4', '2', '11', '38', '56',
        '64', '56', '57', '52', '51', '66', '85', '85', '63', '46', '36',
        '42', '42'],
       ['26', '39', '36', '35', '31', '28', '25', '20', '19', '30', '41',
        '44', '33', '37', '36', '45', '42', '49', '45', '44', '41', '30',
        '24', '13'],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0],
       ['77', '68', '67', '74', '72', '73', '74', '73', '66', '56', '45',
        '37', '40', '42', '47', '49', '56', '67', '72', '69', '70', '70',
        '70', '69'],
       ['1.8', '2', '1.7', '1.6', '1.9', '1.4', '1.5', '1.6', '5.1',
        '15', '4.5', '2.7', '3.5', '3.6', '3.9', '4.4', '9.9', '5.1',
        '3.4', '2.3', '2', '1.9', '1.9', '1.9'],
       ['2', '2', '2', '1.9', '1.9', '1.8', '1.9', '1.9', '2.1', '2',
        '2', '2', '1.9', '1.9', '1.9', '1.9', '1.9', '2.1', '2', '1.9',
        '1.9', '1.9', '1.9', '1.9'],
       ['37', '80', '57', '76', '110', '106', '101', '104', '124', '46',
        '241', '280', '297', '305', '307', '304', '307', '124', '118',
        '121', '113', '112', '106', '110'],
       ['35', '79', '2.4', '55', '94', '116', '106', '94', '232', '153',
        '283', '269', '290', '316', '313', '305', '291', '124', '119',
        '118', '114', '108', '102', '111'],
       ['1.4', '1.8', '1', '0.6', '1.7', '2.5', '2.5', '2', '0.6', '0.8',
        '1.6', '1.9', '2.1', '3.3', '2.5', '2.2', '1.4', '2.2', '2.8',
        '3', '2.6', '2.7', '2.1', '2.1'],
       ['0.5', '0.9', '0.6', '0.3', '0.6', '1.9', '2', '2', '0.5', '0.3',
        '0.8', '1.2', '2', '2.6', '2.1', '2.1', '1.9', '1', '2.5', '2.5',
        '2.8', '2.6', '2.4', '2.3']], dtype=object)

Extract Features (1)

在这里插入图片描述

將原始 4320 * 18 的資料依照每個月分重組成 12 個 18 (features) * 480 (hours) 的資料。 480 = 24 * 20(每个月取20天)

raw_data[0 : 18, :][0]# 这种记录会有480条，组成一个月的sample，但是sample[0]，是第一个features的汇总

array(['14', '14', '14', '13', '12', '12', '12', '12', '15', '17', '20',
       '22', '22', '22', '22', '22', '21', '19', '17', '16', '15', '15',
       '15', '15'], dtype=object)

month_data = {}
for month in range(12):
    sample = np.empty([18, 480])
    for day in range(20):
        sample[:, day * 24 : (day + 1) * 24] = raw_data[18 * (20 * month + day) : 18 * (20 * month + day + 1), :]
    month_data[month] = sample
month_data[0][0][:24]# 第一月WS_HR汇总（顺序反了WS_HR排到最前面）

array([14., 14., 14., 13., 12., 12., 12., 12., 15., 17., 20., 22., 22.,
       22., 22., 22., 21., 19., 17., 16., 15., 15., 15., 15.])

Extract Features (2)

在这里插入图片描述

每個月會有 480hrs，每 9 小時形成一個 data，每個月會有 471 個 data，故總資料數為 471 * 12 筆，而每筆 data 有 9 * 18 的 features (一小時 18 個 features * 9 小時)。

對應的 target 則有 471 * 12 個(第 10 個小時的 PM2.5)

np.set_printoptions(suppress = True)
x = np.empty([12 * 472, 18 * 8], dtype = float)
y = np.empty([12 * 472, 1], dtype = float)
for month in range(12):
    for day in range(20):
        for hour in range(24):
            if day == 19 and hour > 14:
                continue
            x[month * 472 + day * 24 + hour, :] = month_data[month][:,day * 24 + hour : day * 24 + hour + 8].reshape(1, -1) #vector dim:18*9
            y[month * 472 + day * 24 + hour, 0] = month_data[month][9, day * 24 + hour + 8] #value
print(x)
print(y)

[[14.  14.  14.  ...  1.9  2.   2. ]
 [14.  14.  13.  ...  2.   2.   0.5]
 [14.  13.  12.  ...  2.   0.5  0.3]
 ...
 [18.  19.  18.  ...  1.1  1.4  1.3]
 [19.  18.  17.  ...  1.4  1.3  1.6]
 [ 0.   0.   0.  ...  0.   0.   0. ]]
[[19.]
 [30.]
 [41.]
 ...
 [17.]
 [24.]
 [ 0.]]

x[25]#第一笔的第一个features(8小时)AMB_TEMP  9 * 18

array([ 15.  ,  15.  ,  14.  ,  14.  ,  15.  ,  16.  ,  16.  ,  17.  ,
         1.8 ,   1.8 ,   1.8 ,   1.8 ,   1.8 ,   1.8 ,   1.8 ,   1.8 ,
         0.25,   0.28,   0.27,   0.24,   0.26,   0.34,   0.56,   0.79,
         0.05,   0.06,   0.05,   0.05,   0.07,   0.09,   0.19,   0.31,
         1.1 ,   1.3 ,   1.  ,   1.2 ,   1.1 ,   1.6 ,   8.4 ,  17.  ,
         3.2 ,   3.3 ,   3.1 ,   3.1 ,   4.3 ,   9.4 ,  19.  ,  26.  ,
         4.3 ,   4.7 ,   4.1 ,   4.3 ,   5.5 ,  11.  ,  27.  ,  43.  ,
        38.  ,  39.  ,  39.  ,  34.  ,  31.  ,  30.  ,  18.  ,  17.  ,
        34.  ,  31.  ,  16.  ,  18.  ,   8.  ,  16.  ,  24.  ,  37.  ,
        23.  ,  30.  ,  30.  ,  22.  ,  18.  ,  13.  ,  13.  ,  11.  ,
         0.  ,   0.  ,   0.  ,   0.  ,   0.  ,   0.  ,   0.  ,   0.  ,
        66.  ,  70.  ,  71.  ,  70.  ,  67.  ,  59.  ,  60.  ,  55.  ,
         0.8 ,   0.9 ,   0.8 ,   0.8 ,   0.7 ,   0.8 ,   1.2 ,   3.3 ,
         1.8 ,   1.8 ,   1.8 ,   1.8 ,   1.8 ,   1.8 ,   2.  ,   2.1 ,
       117.  , 113.  , 110.  , 100.  ,  64.  ,  80.  ,  88.  ,  50.  ,
       113.  , 115.  , 102.  ,  87.  ,  79.  , 252.  ,  90.  , 216.  ,
         2.7 ,   2.9 ,   2.3 ,   1.5 ,   1.  ,   0.8 ,   4.  ,   1.  ,
         2.4 ,   2.8 ,   2.7 ,   2.  ,   0.5 ,   0.8 ,   1.5 ,   0.7 ])

Normalize (1)

mean_x = np.mean(x, axis = 0) #18 * 9 
std_x = np.std(x, axis = 0) #18 * 9 
for i in range(len(x)): #12 * 471
    for j in range(len(x[0])): #18 * 9 
        if std_x[j] != 0:
            x[i][j] = (x[i][j] - mean_x[j]) / std_x[j]

len(x)# 5652 = 12 * 471

len(x[0])# 162 = 9 * 18

tag = x[0]

#Split Training Data Into “train_set” and "validation_set"
這部分是針對作業中 report 的第二題、第三題做的簡單示範，以生成比較中用來訓練的 train_set 和不會被放入訓練、只是用來驗證的 validation_set。

import math
x_train_set = x[: math.floor(len(x) * 0.8), :]
y_train_set = y[: math.floor(len(y) * 0.8), :]
x_validation = x[math.floor(len(x) * 0.8): , :]
y_validation = y[math.floor(len(y) * 0.8): , :]
print(x_train_set)
print(y_train_set)
print(x_validation)
print(y_validation)
print(len(x_train_set))
print(len(y_train_set))
print(len(x_validation))
print(len(y_validation))

[[-1.33404075 -1.33462207 -1.33500832 ...  0.17629941  0.26946316
   0.26863726]
 [-1.33404075 -1.33462207 -1.49203094 ...  0.26985884  0.26946316
  -1.13432695]
 [-1.33404075 -1.49171996 -1.64905356 ...  0.26985884 -1.1339323
  -1.32138884]
 ...
 [ 0.70895574  0.39345479  0.07819527 ...  1.39257204  0.26946316
  -0.38607937]
 [ 0.39464859  0.07925899  0.07819527 ...  0.26985884 -0.38545472
  -0.38607937]
 [ 0.08034144  0.07925899  0.07819527 ... -0.38505719 -0.38545472
  -0.85373411]]
[[19.]
 [30.]
 [41.]
 ...
 [ 7.]
 [ 5.]
 [14.]]
[[ 0.08034144  0.07925899  0.23521789 ... -0.38505719 -0.85325321
  -0.57314126]
 [ 0.08034144  0.23635689  0.23521789 ... -0.85285436 -0.57257412
   0.5492301 ]
 [ 0.23749501  0.23635689 -0.07882735 ... -0.57217606  0.55014225
  -0.10548653]
 ...
 [-0.70542645 -0.54913259 -0.70691783 ... -0.57217606 -0.29189502
  -0.38607937]
 [-0.54827287 -0.70623048 -0.86394045 ... -0.29149776 -0.38545472
  -0.10548653]
 [-3.53419082 -3.53399261 -3.53332501 ... -1.60132983 -1.60173079
  -1.60198169]]
[[13.]
 [24.]
 [22.]
 ...
 [17.]
 [24.]
 [ 0.]]
4531
4531
1133
1133

Training

在这里插入图片描述

在这里插入图片描述
(和上圖不同處: 下面的 code 採用 Root Mean Square Error)

因為常數項的存在，所以 dimension (dim) 需要多加一欄；eps 項是避免 adagrad 的分母為 0 而加的極小數值。

每一個 dimension (dim) 會對應到各自的 gradient, weight (w)，透過一次次的 iteration (iter_time) 學習。

dim = 18 * 8 + 1
w = np.zeros([dim, 1])
x_ = np.concatenate((np.ones([12 * 472, 1]), x), axis = 1).astype(float)
print(len(x_[0]))
print(len(w))

145
145

这里补了为1的一列，用于常数项使用，即 $x_{np} * w_p = w_p$



learning_rate = 200
iter_time = 20000
adagrad = np.zeros([dim, 1])
eps = 0.0000000001
for t in range(iter_time):
    loss = np.sqrt(np.sum(np.power(np.dot(x_, w) - y, 2))/472/12)#rmse
    if(t%100==0):
        print(str(t) + ":" + str(loss))
    gradient = 2 * np.dot(x_.transpose(), np.dot(x_, w) - y) #dim*1
    adagrad += gradient ** 2
    w = w - learning_rate * gradient / np.sqrt(adagrad + eps)
np.save('weight.npy', w)
w

0:27.04269876504886
100:42.316253745299576
200:31.074186243413685
300:25.537280583796065
400:22.165757929684172
500:19.75254567886226

…
…
98300:5.688854049407231
98400:5.688854025502277
98500:5.68885400177681
98600:5.6888539782294805
98700:5.6888539548589465
98800:5.688853931663879
98900:5.688853908642958
99000:5.688853885794873
99100:5.688853863118321
99200:5.688853840612015
99300:5.688853818274671
99400:5.68885379610502
99500:5.688853774101799
99600:5.688853752263754
99700:5.6888537305896465
99800:5.68885370907824
99900:5.688853687728311

array([[21.32627119],
       [-0.59770122],
       [ 1.19618126],
       [-1.71854272],
       [-0.00412458],
       [ 1.06928241],
       [-0.36250559],
       [-1.81932088],
       [ 2.10286256],
       [ 0.12625626],
       [ 0.10239455],
       [ 0.04337644],
       [ 0.05290508],
       [-0.0207053 ],
       [-0.25997183],
       [ 0.15053346],
       [ 0.58737417],
       [ 0.01482249],
       [ 0.06264181],
       [-0.16256801],
       [ 0.14738827],
       [-0.02307035],
       [-0.16147156],
       [ 0.07498768],
       [ 0.40381712],
       [ 0.31507161],
       [-0.24065449],
       [ 0.23966163],
       [ 0.38239296],
       [-0.52786859],
       [ 0.0581854 ],
       [ 0.16077347],
       [ 0.06112612],
       [ 0.11976572],
       [ 0.23739248],
       [-1.04772983],
       [-0.47176797],
       [-0.55928703],
       [ 0.0159055 ],
       [-0.54088112],
       [-0.68640924],
       [-0.06704892],
       [-0.1638642 ],
       [-2.79479872],
       [-1.46477859],
       [-0.90830161],
       [-0.4895747 ],
       [-2.71160536],
       [ 0.13711248],
       [-0.38775076],
       [ 0.12351668],
       [ 3.22246928],
       [ 1.65049504],
       [ 1.25627249],
       [ 0.14705583],
       [ 2.61715673],
       [ 1.9780861 ],
       [ 0.1623215 ],
       [-0.16639199],
       [-0.36364337],
       [ 0.01706278],
       [-0.56897905],
       [-0.42067013],
       [-0.17519396],
       [ 1.8326884 ],
       [ 0.36716326],
       [-0.81083076],
       [ 1.0596038 ],
       [-0.10137155],
       [-1.05794694],
       [ 1.02079922],
       [-0.05240129],
       [ 0.91110234],
       [-0.82112217],
       [ 3.55240057],
       [-3.51869125],
       [-0.69883282],
       [ 8.19423616],
       [-9.11851928],
       [ 0.28782482],
       [15.86037477],
       [ 0.00647815],
       [-0.0904938 ],
       [ 0.00927876],
       [-0.1027838 ],
       [ 0.10683582],
       [ 0.03512709],
       [-0.08788882],
       [-0.15754034],
       [ 0.12550395],
       [ 0.37267731],
       [-0.85912138],
       [-0.58034475],
       [ 1.18502834],
       [-1.32860977],
       [ 0.05769156],
       [ 0.6495768 ],
       [ 0.22441773],
       [-0.08086894],
       [-0.12145987],
       [-0.03584999],
       [ 0.10052267],
       [-0.2054299 ],
       [ 0.28685996],
       [ 0.22959199],
       [-0.32587512],
       [ 0.12923094],
       [-0.40514046],
       [-0.175628  ],
       [ 0.37578853],
       [ 0.15293797],
       [-0.35953503],
       [-0.1440169 ],
       [ 0.17412323],
       [-0.06657272],
       [ 0.18199105],
       [ 0.05810032],
       [ 0.16962856],
       [-0.24043043],
       [ 0.10899224],
       [ 0.04145799],
       [-0.09666332],
       [ 0.09489148],
       [-0.18459977],
       [ 0.03905182],
       [-0.00918566],
       [ 0.00566057],
       [-0.24082881],
       [ 0.02100525],
       [-0.14123642],
       [ 0.19429907],
       [ 0.0552319 ],
       [-0.11646291],
       [-0.08053281],
       [-0.0443063 ],
       [-0.0314302 ],
       [-0.13354943],
       [ 0.0625202 ],
       [-0.10887952],
       [-0.21179497],
       [-0.02862832],
       [ 0.34464294],
       [-0.02245315],
       [-0.29397586],
       [ 0.18813077]])

Testing

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WTIvSWaZ-1594744658937)(https://drive.google.com/uc?id=1165ETzZyE6HStqKvgR0gKrJwgFLK6-CW)]

載入 test data，並且以相似於訓練資料預先處理和特徵萃取的方式處理，使 test data 形成 240 個維度為 18 * 9 + 1 的資料。

# testdata = pd.read_csv('gdrive/My Drive/hw1-regression/test.csv', header = None, encoding = 'big5')
testdata = pd.read_csv('data/test.csv', header = None, encoding = 'big5')
test_data = testdata.iloc[:, 2:]
testdata[:20]

	0	1	2	3	4	5	6	7	8	9	10
0	id_0	AMB_TEMP	21	21	20	20	19	19	19	18	17
1	id_0	CH4	1.7	1.7	1.7	1.7	1.7	1.7	1.7	1.7	1.8
2	id_0	CO	0.39	0.36	0.36	0.4	0.53	0.55	0.34	0.31	0.23
3	id_0	NMHC	0.16	0.24	0.22	0.27	0.27	0.26	0.27	0.29	0.1
4	id_0	NO	1.3	1.3	1.3	1.3	1.4	1.6	1.2	1.1	0.9
5	id_0	NO2	17	14	13	14	18	21	8.9	9.4	5
6	id_0	NOx	18	16	14	15	20	23	10	10	5.8
7	id_0	O3	32	31	31	26	16	12	27	20	26
8	id_0	PM10	62	50	44	39	38	32	48	36	25
9	id_0	PM2.5	33	39	39	25	18	18	17	9	4
10	id_0	RAINFALL	NR	NR	NR	NR	NR	NR	NR	NR	NR
11	id_0	RH	83	85	87	87	86	85	78	81	80
12	id_0	SO2	2	1.8	1.8	1.8	2.1	2.6	2	2.3	2.4
13	id_0	THC	1.8	1.9	1.9	2	2	2	2	2	1.9
14	id_0	WD_HR	58	53	67	59	59	73	79	82	104
15	id_0	WIND_DIREC	57	44	73	44	56	115	45	107	103
16	id_0	WIND_SPEED	1.4	1.3	1.5	1.4	1.6	1.6	1.2	1.8	2.3
17	id_0	WS_HR	1	0.9	0.9	0.9	1.2	0.7	1	0.6	1.8
18	id_1	AMB_TEMP	14	13	13	13	13	13	13	12	13
19	id_1	CH4	1.8	1.8	1.8	1.8	1.8	1.8	1.7	1.7	1.8

test_data[:20]

	2	3	4	5	6	7	8	9	10
0	21	21	20	20	19	19	19	18	17
1	1.7	1.7	1.7	1.7	1.7	1.7	1.7	1.7	1.8
2	0.39	0.36	0.36	0.4	0.53	0.55	0.34	0.31	0.23
3	0.16	0.24	0.22	0.27	0.27	0.26	0.27	0.29	0.1
4	1.3	1.3	1.3	1.3	1.4	1.6	1.2	1.1	0.9
5	17	14	13	14	18	21	8.9	9.4	5
6	18	16	14	15	20	23	10	10	5.8
7	32	31	31	26	16	12	27	20	26
8	62	50	44	39	38	32	48	36	25
9	33	39	39	25	18	18	17	9	4
10	NR	NR	NR	NR	NR	NR	NR	NR	NR
11	83	85	87	87	86	85	78	81	80
12	2	1.8	1.8	1.8	2.1	2.6	2	2.3	2.4
13	1.8	1.9	1.9	2	2	2	2	2	1.9
14	58	53	67	59	59	73	79	82	104
15	57	44	73	44	56	115	45	107	103
16	1.4	1.3	1.5	1.4	1.6	1.6	1.2	1.8	2.3
17	1	0.9	0.9	0.9	1.2	0.7	1	0.6	1.8
18	14	13	13	13	13	13	13	12	13
19	1.8	1.8	1.8	1.8	1.8	1.8	1.7	1.7	1.8


test_data[test_data == 'NR'] = 0
test_data = test_data.to_numpy()
test_x = np.empty([240, 18*8], dtype = float)
test_y = np.empty([240, 1], dtype = float)
for i in range(240):
    test_x[i, :] = test_data[18 * i: 18* (i + 1), :8].reshape(1, -1)
    test_y[i, :] = test_data[9 * (i+1), 8]
test_x[0]
test_y[0]

/root/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/root/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py:3414: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._where(-key, value, inplace=True)





array([4.])


#重新计算mean和std
mean_x = np.mean(test_x, axis = 0) #18 * 9 
std_x = np.std(test_x, axis = 0) #18 * 9 

for i in range(len(test_x)):
    for j in range(len(test_x[0])):
        if std_x[j] != 0:
            test_x[i][j] = (test_x[i][j] - mean_x[j]) / std_x[j]

test_x[0]

array([-0.36400913, -0.37580621, -0.58294053, -0.6039988 , -0.79754419,
       -0.77993183, -0.77134213, -0.93924642, -0.23381365, -0.20614316,
       -0.178965  , -0.19738551, -0.20112982, -0.17849306, -0.21800613,
       -0.1462465 , -0.02243175, -0.21006416, -0.19137731,  0.00237043,
        0.64996798,  0.8347128 , -0.24763691, -0.26774851,  0.12113192,
        0.72194031,  0.62642991,  1.1193606 ,  1.07661245,  0.99170142,
        1.14457532,  1.59800227, -0.41421696, -0.37903818, -0.44066817,
       -0.3619202 , -0.40277453, -0.27806221, -0.42701048, -0.42302961,
        0.93257409,  0.38666369,  0.29567325,  0.40431516,  0.932154  ,
        1.38609225, -0.35672738, -0.21797962,  0.66821673,  0.27394276,
        0.11390661,  0.16298197,  0.74432972,  1.04369186, -0.44653418,
       -0.37384103, -0.01811478, -0.05063733, -0.10172555, -0.35003493,
       -0.83314626, -1.02814938, -0.37646249, -0.70842781,  0.35455966,
        0.00370494, -0.20993074, -0.35090948, -0.3798995 , -0.57390832,
       -0.14049582, -0.48849244,  0.27232352,  0.56559302,  0.54673056,
       -0.10839156, -0.44431028, -0.46020247, -0.50975521, -0.91310287,
       -0.11549831, -0.15822222, -0.14537882, -0.19986573, -0.12898544,
       -0.12412819, -0.1111852 , -0.13155091,  0.7839987 ,  0.94178976,
        1.11502694,  1.11585946,  1.03145997,  0.94421659,  0.44932091,
        0.67745778, -0.66448801, -0.69458922, -0.8241389 , -0.76852537,
       -0.60883941, -0.3483424 , -0.81324231, -0.52086313, -0.39850542,
        0.13751952,  0.20242608,  0.79950013,  0.80387309,  0.83182892,
        0.89047392,  0.97892759, -1.25158954, -1.32292147, -1.08841118,
       -1.13878394, -1.23022993, -1.13992527, -1.05895382, -0.98101608,
       -1.25641454, -1.38157623, -1.0202544 , -1.36540596, -1.26620751,
       -0.68274526, -1.44856806, -0.69165696, -0.71738483, -0.74234011,
       -0.60566186, -0.6923032 , -0.59214802, -0.56088831, -0.96087669,
       -0.36666637, -0.6863133 , -0.77984735, -0.76161203, -0.75062533,
       -0.49862637, -1.04516832, -0.70121844, -1.02610681])

tag

len(test_x[0])

test_x = np.concatenate((np.ones([240, 1]), test_x), axis = 1).astype(float)

Prediction

說明圖同上

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UzSTSYxu-1594744658939)(https://drive.google.com/uc?id=1165ETzZyE6HStqKvgR0gKrJwgFLK6-CW)]

有了 weight 和測試資料即可預測 target。

w = np.load('weight.npy')

array([[21.32627119],
       [-0.59770122],
       [ 1.19618126],
       [-1.71854272],
       [-0.00412458],
       [ 1.06928241],
       [-0.36250559],
       [-1.81932088],
       [ 2.10286256],
       [ 0.12625626],
       [ 0.10239455],
       [ 0.04337644],
       [ 0.05290508],
       [-0.0207053 ],
       [-0.25997183],
       [ 0.15053346],
       [ 0.58737417],
       [ 0.01482249],
       [ 0.06264181],
       [-0.16256801],
       [ 0.14738827],
       [-0.02307035],
       [-0.16147156],
       [ 0.07498768],
       [ 0.40381712],
       [ 0.31507161],
       [-0.24065449],
       [ 0.23966163],
       [ 0.38239296],
       [-0.52786859],
       [ 0.0581854 ],
       [ 0.16077347],
       [ 0.06112612],
       [ 0.11976572],
       [ 0.23739248],
       [-1.04772983],
       [-0.47176797],
       [-0.55928703],
       [ 0.0159055 ],
       [-0.54088112],
       [-0.68640924],
       [-0.06704892],
       [-0.1638642 ],
       [-2.79479872],
       [-1.46477859],
       [-0.90830161],
       [-0.4895747 ],
       [-2.71160536],
       [ 0.13711248],
       [-0.38775076],
       [ 0.12351668],
       [ 3.22246928],
       [ 1.65049504],
       [ 1.25627249],
       [ 0.14705583],
       [ 2.61715673],
       [ 1.9780861 ],
       [ 0.1623215 ],
       [-0.16639199],
       [-0.36364337],
       [ 0.01706278],
       [-0.56897905],
       [-0.42067013],
       [-0.17519396],
       [ 1.8326884 ],
       [ 0.36716326],
       [-0.81083076],
       [ 1.0596038 ],
       [-0.10137155],
       [-1.05794694],
       [ 1.02079922],
       [-0.05240129],
       [ 0.91110234],
       [-0.82112217],
       [ 3.55240057],
       [-3.51869125],
       [-0.69883282],
       [ 8.19423616],
       [-9.11851928],
       [ 0.28782482],
       [15.86037477],
       [ 0.00647815],
       [-0.0904938 ],
       [ 0.00927876],
       [-0.1027838 ],
       [ 0.10683582],
       [ 0.03512709],
       [-0.08788882],
       [-0.15754034],
       [ 0.12550395],
       [ 0.37267731],
       [-0.85912138],
       [-0.58034475],
       [ 1.18502834],
       [-1.32860977],
       [ 0.05769156],
       [ 0.6495768 ],
       [ 0.22441773],
       [-0.08086894],
       [-0.12145987],
       [-0.03584999],
       [ 0.10052267],
       [-0.2054299 ],
       [ 0.28685996],
       [ 0.22959199],
       [-0.32587512],
       [ 0.12923094],
       [-0.40514046],
       [-0.175628  ],
       [ 0.37578853],
       [ 0.15293797],
       [-0.35953503],
       [-0.1440169 ],
       [ 0.17412323],
       [-0.06657272],
       [ 0.18199105],
       [ 0.05810032],
       [ 0.16962856],
       [-0.24043043],
       [ 0.10899224],
       [ 0.04145799],
       [-0.09666332],
       [ 0.09489148],
       [-0.18459977],
       [ 0.03905182],
       [-0.00918566],
       [ 0.00566057],
       [-0.24082881],
       [ 0.02100525],
       [-0.14123642],
       [ 0.19429907],
       [ 0.0552319 ],
       [-0.11646291],
       [-0.08053281],
       [-0.0443063 ],
       [-0.0314302 ],
       [-0.13354943],
       [ 0.0625202 ],
       [-0.10887952],
       [-0.21179497],
       [-0.02862832],
       [ 0.34464294],
       [-0.02245315],
       [-0.29397586],
       [ 0.18813077]])


ans_y = np.dot(test_x, w)
print(ans_y[:10])
print(test_y[:10])

[[ 3.49897912]
 [10.71431377]
 [24.25967974]
 [ 4.60526241]
 [25.49323832]
 [15.20700794]
 [18.56378292]
 [28.2902962 ]
 [22.1783548 ]
 [46.30543741]]
[[ 4.]
 [13.]
 [18.]
 [28.]
 [26.]
 [14.]
 [ 4.]
 [31.]
 [30.]
 [20.]]

sum (abs(ans_y - test_y)) / len(ans_y)

array([16.0961944])

Save Prediction to CSV File

import csv
with open('submit.csv', mode='w', newline='') as submit_file:
    csv_writer = csv.writer(submit_file)
    header = ['id', 'value']
    print(header)
    csv_writer.writerow(header)
    for i in range(240):
        row = ['id_' + str(i), ans_y[i][0]]
        csv_writer.writerow(row)
        print(row)