终究还是要学习Tensorflow 2.0 的,发现确实2.0比1.0版本的好太多了,感觉精简了些,同时训练的细节要自己实现,这比keras更锻炼自己一些,keras封装太厉害啦。而且可能是我学习的tf2.0少,学到目前还都不需要构建数据集的生成器,这点比keras, pytorch又省了一波。
下面附上所用的数据集,来自UCI:MPG数据集链接。
我只实现了数据集的准备,训练部分,测试的话没啥意思,重点是学习Tensorflow 2.0 的语法。
# -*- coding: utf- 8 -*-
'''
Author: Unlabel
Date: 2020-10-09 09:50:44
LastEditTime: 2020-10-09 16:20:06
Description: 利用全连接网络模型来完成汽车的效能指标 MPG(Mile Per Gallon,每加仑
燃油英里数)的预测问题实战。
FilePath: /tf2-learning/Auto_MPG.py
'''
import os
import tensorflow as tf
import pandas as pd
def norm(x):
"""标准化数据"""
return (x - train_stats['mean']) / train_stats['std']
class Net(tf.keras.Model):
def __init__(self):
super(Net, self).__init__()
self.fc1 = tf.keras.layers.Dense(64, activation='relu', input_dim=9)
self.fc2 = tf.keras.layers.Dense(64, activation='relu')
self.fc3 = tf.keras.layers.Dense(1, activation=None)
def call(self, inputs):
x = self.fc1(inputs)
x = self.fc2(x)
x = self.fc3(x)
return x
def train(train_db, epochs):
for epoch in range(epochs):
for step, (x, y) in enumerate(train_db):
with tf.GradientTape() as tape:
y_hat = model(x)
loss = tf.reduce_mean(tf.keras.losses.mse(y, y_hat))
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
print("epoch: %d / step: %d --------- loss: %f" % (epoch, step, loss))
if __name__ == "__main__":
data_path = os.path.join(os.getcwd(), 'data', 'auto-mpg.data')
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model Year', 'Origin']
row_dataset = pd.read_csv(data_path, names=column_names, na_values="?", comment="\t", sep=" ", skipinitialspace=True)
dataset = row_dataset.copy()
# 统计空白数据
# print(dataset.isna().sum())
# 清除空白项
dataset = dataset.dropna()
origin = dataset.pop("Origin")
dataset["USA"] = (origin == 1) * 1.0
dataset["Europe"] = (origin == 2) * 1.0
dataset["Japan"] = (origin == 3) * 1.0
# 划分训练集和测试集
train_data = dataset.sample(frac=0.8, random_state=0)
test_data = dataset.drop(train_data.index)
# 移除MPA能效这一列为真实标签Y
# pop仅可以删除列, drop不仅可以删除列还可以删除行
train_label = train_data.pop("MPG")
test_label = test_data.pop("MPG")
# describe()方法用来展示数据的一些描述性统计信息,例如:数量、mean、std等等
train_stats = train_data.describe()
train_stats = train_stats.transpose()
normed_train_data = norm(train_data)
normed_test_data = norm(test_data)
train_db = tf.data.Dataset.from_tensor_slices((normed_train_data.values, train_label.values))
train_db = train_db.shuffle(100).batch(16)
# 构建模型并训练
model = Net()
# 32 为设置的 batch-size, 即shuflle().batch(batzh-size)
model.build(input_shape=(16, 9))
# model.summary()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
train_epochs = 200
train(train_db, train_epochs)
欢迎大家批评指正,谢谢大家!