李宏毅第三次Task3

最新推荐文章于 2023-04-13 20:37:13 发布

孤客...

最新推荐文章于 2023-04-13 20:37:13 发布

阅读量298

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/qq_38364952/article/details/90450149

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

作业1：预测PM2.5的值
在这个作业中，我们将用梯度下降方法预测PM2.5的值
hw1要求：
1、要求python3.5+
2、只能用（1）numpy（2）scipy（3）pandas
3、请用梯度下降手写线性回归
4、最好的公共简单基线
5、对于想加载模型而并不想运行整个训练过程的人：
请上传训练代码并命名成 train.py
只要用梯度下降的代码就行了
hw_best要求：
1、要求python3.5+
2、任何库都可以用
3、在kaggle上获得你选择的更高的分

数据介绍：
本次作業使用豐原站的觀測記錄，分成train set跟test set，train set是豐原站每個月的前20天所有資料test set則是從豐原站剩下的資料中取樣出來。
train.csv:每個月前20天每個小時的氣象資料(每小時有18種測資)。共12個月。
test.csv:從剩下的資料當中取樣出連續的10小時為一筆，前九小時的所有觀測數據當作feature，第十小時的PM2.5當作answer。一共取出240筆不重複的test data，請根據feauure預測這240筆的PM2.5。

请完成之后参考以下资料：
Sample_code:https://ntumlta.github.io/2017fall-ml-hw1/code.html
Supplementary_Slide:https://docs.google.com/presentation/d/1WwIQAVI0RRA6tpcieynPVoYDuMmuVKGvVNF_DSKIiDI/edit#slide=id.g1ef6d808f1_2_0
答案参考answer.csv

#coding=utf-8
#Version:python3.6.0
#Tools:Pycharm 2017.3.2
__date__ = '2019/5/22 13:51'
__author__ = 'ranchunfu'

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

train = pd.read_csv("./Dataset/train.csv")
test = pd.read_csv("./Dataset/test(1).csv")
train = train[train['observation'] == 'PM2.5']
test = test[test['AMB_TEMP'] == 'PM2.5']

train = train.iloc[:,3:]
test = test.iloc[:,2:]

train = np.array(train, dtype = 'float32')
test = np.array(test, dtype = 'float32')

train = train.reshape(1,train.shape[0]*train.shape[1])
PM = train

#数据归一化 参考追风者
PM_mean = int(PM.mean())
PM_theta = int(PM.var()**0.5)
PM = (PM - PM_mean) / PM_theta

np.random.seed(3)
W = np.random.randn(1,10) * 0.01
# b = np.zeros((1,1))

#正向传播以及梯度下降
costs = []
lean_rate = 0.1
m = PM.shape[1] - 9
for i in range(150):
    cost = 0
    grad = 0
    for j in range(m):
        x = np.array(PM[:,j:j+9])
        x = np.insert(x,0,1).reshape(10,1)
        error = PM[:,j+9] - np.dot(W,x)
        cost += float(error**2)
        grad += (error) * x.T
    cost = cost / (2*m)
    costs.append(cost)
    dW = grad/m
    if i % 10 == 0 :
        print(cost)
    W = W + lean_rate*dW

plt.plot(costs)
plt.xlabel("num of iter")
plt.ylabel("cost")
plt.title("learn = 0.1")
plt.show()

#处理测试数据
test = pd.read_csv("./Dataset/test(1).csv")
test = test[test['AMB_TEMP'] == 'PM2.5']
test = test.iloc[:,2:]
x = test.insert(0,0,1)
test = test.T
test = np.array(test, dtype = 'float32')

test_pred = np.dot(W,test)  #正向传播
np.set_printoptions(precision=3)
np.set_printoptions(suppress=True)
# print(test_pred)

answer = pd.read_csv("answer.csv")
answer = answer["value"].values
answer = answer.reshape(1,240)
print(np.sum((y_pred - answer)**2)/240)

cost曲线

在这里插入图片描述

y_pred值

在这里插入图片描述

评价指标

在这里插入图片描述

孤客...

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
李宏毅第三次Task3

作业1：预测PM2.5的值在这个作业中，我们将用梯度下降方法预测PM2.5的值hw1要求：1、要求python3.5+2、只能用（1）numpy（2）scipy（3）pandas3、请用梯度下降手写线性回归4、最好的公共简单基线5、对于想加载模型而并不想运行整个训练过程的人：请上传训练代码并命名成 train.py只要用梯度下降的代码就行了hw_best要求：1、要求pyth...
复制链接

扫一扫

专栏目录