实战之回归(regression)模型
一、模型构建
1、导包
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import sklearn
import pandas as pd
import os
import sys
import time
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)
print(sys.version_info)
for module in mpl, np, pd, sklearn, tf, keras:
print(module.__name__, module.__version__)
2、引入数据(加利福尼亚的房价数据)
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
print(housing.DESCR)
print(housing.data.shape)
print(housing.target.shape)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DvtdofuN-1575541884584)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191203160028975.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bWcioSeW-1575541884585)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191203160052815.png)]
3、打印部分数据
import pprint
pprint.pprint(housing.data[0:5])
pprint.pprint(housing.target[0:5])
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XyOE7M6G-1575541884586)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191203160247509.png)]
4、划分样本
# train_test_split 默认划分是1:3
from sklearn.model_selection import train_test_split
x_train_all, x_test, y_train_all, y_test = train_test_split(
housing.data, housing.target, random_state = 7)
x_train, x_valid, y_train, y_valid = train_test_split(
x_train_all, y_train_all, random_state = 11 )
print(x_train.shape, y_train.shape) # 11610
print(x_test.shape, y_test.shape) # 5160
print(x_valid.shape, y_valid.shape) # 3870
print(x_train_all.shape, y_train_all.shape) #15480
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-75t0Q0ij-1575541884587)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191203161556671.png)]
5、归一化
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_valid_scaled = scaler.transform(x_valid)
x_test_scaled = scaler.transform(x_test)
6、搭建模型
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu",
input_shape=x_train.shape[1:]),
keras.layers.Dense(1),
])
model.summary()
model.compile(loss="mean_squared_error", optimizer="sgd")
callbacks = [keras.callbacks.EarlyStopping(
patience=5, min_delta=1e-2)]
7、训练模型
history = model.fit(x_train_scaled, y_train, validation_data=(
x_valid_scaled, y_valid),
epochs=100, callbacks=callbacks)
# 因为回调函数阈值的设置,epochs只进行了30次
8、画图
def plot_learning_curves(history):
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0,1)
plt.show()
plot_learning_curves(history)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9q7QpXID-1575541884587)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191203164227060.png)]
9、测试测试集
model.evaluate(x_test_scaled,y_test)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XUcAPKWF-1575541884588)(/Users/bobwang/Library/Application Support/typora-user-images/image-20191203164511880.png)]
结果只有0.44