我有类似于以下格式的csv数据,我想根据前两列特征预测得到第三列的数值
代码如下:
#coding=utf-8
from keras import models
from keras import layers
from sklearn import preprocessing
from sklearn.model_selection import KFold
from keras.datasets import boston_housing
import numpy as np
import pandas as pd
train_xl = './t.csv'
val_xl = './v.csv'
data_train = pd.read_csv(train_xl)
train = data_train.head()
data_val = pd.read_csv(val_xl)
val = data_val.head()
#########################train####################
new_train = data_train
arr_mean1 = np.mean(data_train['feature1'])
arr_std1 = np.std(data_train['feature1'],ddof=1)
newdata1 = (data_train['feature1']-arr_mean1 )/arr_std1
arr_mean2 = np.mean(data_train['feature2'])
arr_std2 = np.std(data_train['feature2'],ddof=1)
newdata2 = (data_train['feature2']-arr_mean2 )/arr_std2
new_train['feature1'] = newdata1
new_train['feature2'] = newdata2
#########################val######################
new_val = data_val
arr_mean1 = np.mean(data_val['feature1'])
arr_std1 = np.std(data_val['feature1'],ddof=1)
newdata1 = (data_val['feature1']-arr_mean1 )/arr_std1
arr_mean2 = np.mean(data_val['feature2'])
arr_std2 = np.std(data_val['feature2'],ddof=1)
newdata2 = (data_val['feature2']-arr_mean2 )/arr_std2
new_val['feature1'] = newdata1
new_val['feature2'] = newdata2
train_data = new_train[['feature1', 'feature2']]
train_targets = new_train[['class']]
test_data = new_val[['feature1', 'feature2']]
test_targets = new_val[['class']]
print('the shape of train data is ',train_data.shape)
print('the shape of test data is ',test_data.shape)
print('the shape of train target is ',train_targets.shape)
train_data = train_data.values
train_targets = train_targets.values
test_data = test_data.values
test_targets = test_targets.values
def build_model():
model = models.Sequential()
model.add(layers.Dense(64,activation='relu',input_shape=(train_data.shape[1],)))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse',metrics=['mae'])
return model
model = build_model()
model.fit(train_data,train_targets,epochs=100,batch_size=64)
test_pred = model.predict(test_data)
print(test_pred)
注:这是一个回归问题,不是分类问题啊,我的数据文件命名可能有歧义,代码没问题,也可以用别的回归方法试试对比下,还有岭回归、losso回归什么的很好找。
读数据的方法有点笨,看看就好,数据和代码都打包了一份,不需要数据就都不用下了,这就是一个简单的例子,后面有可能用到,所以先记录一下