天池-工业蒸汽量预测-DAY1

天池-工业蒸汽量预测-DAY1

编写代码记录

此博客为笔记博客,记录当天的代码及一些感想

// An highlighted block
#!/usr/bin/python
# coding=UTF-8
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.decomposition import PCA
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import BayesianRidge, LinearRegression, ElasticNet

data = pd.read_table('/Users/wangxinjie/Desktop/王新杰-学习/天池-工业蒸汽量预测/zhengqi_train.txt',encoding='gb2312',sep='\t',index_col=None)
print(data.head())
#print(data.info())
#print(data.describe())
#data.describe().to_csv('/Users/wangxinjie/Desktop/王新杰-学习/天池-工业蒸汽量预测/zhengqi_train_describe.csv')
#sns.jointplot(x="V5",y="V1",data=data,kind='scatter')
#sns.pairplot(data)
#plt.show()
y=data["target"]
X=data.iloc[:,:-1]
x_train,x_test,y_train,y_test = train_test_split(X,y,random_state=33,test_size=0.25)
print(x_test.shape)
pca = PCA(n_components=30)
newX = pca.fit_transform(x_train)
#print(newX)
print(sum(pca.explained_variance_ratio_))

l_svr = SVR(kernel='linear',C=0.8)
l_svr.fit(newX,y_train)
y_predict = l_svr.predict(pca.transform(x_test))
print(mean_squared_error(y_test, y_predict))

model_br = BayesianRidge()  # 建立贝叶斯岭回归模型对象
model_lr = LinearRegression()  # 建立普通线性回归模型对象
model_etc = ElasticNet()  # 建立弹性网络回归模型对象
model_br.fit(newX,y_train)
model_lr.fit(newX,y_train)
model_etc.fit(newX,y_train)

y_predict_br = model_br.predict(pca.transform(x_test))
y_predict_lr = model_lr.predict(pca.transform(x_test))
y_predict_etc = model_etc.predict(pca.transform(x_test))
print(mean_squared_error(y_test, y_predict_br))
print(mean_squared_error(y_test, y_predict_lr))
print(mean_squared_error(y_test, y_predict_etc))


'''
rfr = RandomForestRegressor()
rfr.fit(newX,y_train)
y_predict = rfr.predict(pca.transform(x_test))
print(mean_squared_error(y_test, y_predict))

from sklearn.ensemble import GradientBoostingRegressor
gbr = GradientBoostingRegressor()
gbr.fit(newX,y_train)
y_predict = gbr.predict(pca.transform(x_test))
print(mean_squared_error(y_test, y_predict))
'''

运行结果

上述代码的运行结果如下:

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/wangxinjie/PycharmProjects/practice/test.py
      V0     V1     V2     V3     V4  ...    V34    V35    V36    V37  target
0  0.566  0.016 -0.143  0.407  0.452  ... -4.789 -5.101 -2.608 -3.508   0.175
1  0.968  0.437  0.066  0.566  0.194  ...  0.160  0.364 -0.335 -0.730   0.676
2  1.013  0.568  0.235  0.370  0.112  ...  0.160  0.364  0.765 -0.589   0.633
3  0.733  0.368  0.283  0.165  0.599  ... -0.065  0.364  0.333 -0.112   0.206
4  0.684  0.638  0.260  0.209  0.337  ... -0.215  0.364 -0.280 -0.028   0.384

[5 rows x 39 columns]
(722, 38)
0.9923273091379566
0.13928156094830826
0.13621047336942835
0.13628134349739923
0.31792895662176945

Process finished with exit code 0

总结及后续展望

目前未调参数最好的结果未贝叶斯岭回归,0.13621047336942835。个人认为通过调整参数所能提升的效果甚微,后续考虑特征工程方案,针对目前已有的37个特征进行分析整理生成,采用特征工程常用方法,再重新进行预测后看下效果。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值