通过Keras模型类搭建神经网络模型
数据集下载:
链接:https://pan.baidu.com/s/19IQ6BHmT_zNyJoaB_pPa3g
提取码:1nfc
import tensorflow as tf
import pandas as pd
import numpy as np
pandas读取数据
data = pd.read_csv("creditcard.csv")
data.head()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | ... | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | -1.359807 | -0.072781 | 2.536347 | 1.378155 | -0.338321 | 0.462388 | 0.239599 | 0.098698 | 0.363787 | ... | -0.018307 | 0.277838 | -0.110474 | 0.066928 | 0.128539 | -0.189115 | 0.133558 | -0.021053 | 149.62 | 0 |
1 | 0.0 | 1.191857 | 0.266151 | 0.166480 | 0.448154 | 0.060018 | -0.082361 | -0.078803 | 0.085102 | -0.255425 | ... | -0.225775 | -0.638672 | 0.101288 | -0.339846 | 0.167170 | 0.125895 | -0.008983 | 0.014724 | 2.69 | 0 |
2 | 1.0 | -1.358354 | -1.340163 | 1.773209 | 0.379780 | -0.503198 | 1.800499 | 0.791461 | 0.247676 | -1.514654 | ... | 0.247998 | 0.771679 | 0.909412 | -0.689281 | -0.327642 | -0.139097 | -0.055353 | -0.059752 | 378.66 | 0 |
3 | 1.0 | -0.966272 | -0.185226 | 1.792993 | -0.863291 | -0.010309 | 1.247203 | 0.237609 | 0.377436 | -1.387024 | ... | -0.108300 | 0.005274 | -0.190321 | -1.175575 | 0.647376 | -0.221929 | 0.062723 | 0.061458 | 123.50 | 0 |
4 | 2.0 | -1.158233 | 0.877737 | 1.548718 | 0.403034 | -0.407193 | 0.095921 | 0.592941 | -0.270533 | 0.817739 | ... | -0.009431 | 0.798278 | -0.137458 | 0.141267 | -0.206010 | 0.502292 | 0.219422 | 0.215153 | 69.99 | 0 |
5 rows × 31 columns
数据集描述
欧洲的信用卡持卡人在2013年9月2天时间里的284807笔交易数据,其中有492笔交易是欺诈交易,占比0.172%。数据采用PCA变换映射为V1,V2,…,V28 数值型属性,只有交易时间和金额这两个变量没有经过PCA变换。输出变量为二值变量,1为欺诈,0为正常交易。
Time(交易时间,需将s转化为hh-mm-ss形式)
V1~V28(经PCA转换后的数字变量)
Amount(交易金额)
Class(交易类型,1为欺诈,0为正常交易)
df = data.drop("Time",axis=1)
df.head()
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | ... | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -1.359807 | -0.072781 | 2.536347 | 1.378155 | -0.338321 | 0.462388 | 0.239599 | 0.098698 | 0.363787 | 0.090794 | ... | -0.018307 | 0.277838 | -0.110474 | 0.066928 | 0.128539 | -0.189115 | 0.133558 | -0.021053 | 149.62 | 0 |
1 | 1.191857 | 0.266151 | 0.166480 | 0.448154 | 0.060018 | -0.082361 | -0.078803 | 0.085102 | -0.255425 | -0.166974 | ... | -0.225775 | -0.638672 | 0.101288 | -0.339846 | 0.167170 | 0.125895 | -0.008983 | 0.014724 | 2.69 | 0 |
2 | -1.358354 | -1.340163 | 1.773209 | 0.379780 | -0.503198 | 1.800499 | 0.791461 | 0.247676 | -1.514654 | 0.207643 | ... | 0.247998 | 0.771679 | 0.909412 | -0.689281 | -0.327642 | -0.139097 | -0.055353 | -0.059752 | 378.66 | 0 |
3 | -0.966272 | -0.185226 | 1.792993 | -0.863291 | -0.010309 | 1.247203 | 0.237609 | 0.377436 | -1.387024 | -0.054952 | ... | -0.108300 | 0.005274 | -0.190321 | -1.175575 | 0.647376 | -0.221929 | 0.062723 | 0.061458 | 123.50 | 0 |
4 | -1.158233 | 0.877737 | 1.548718 | 0.403034 | -0.407193 | 0.095921 | 0.592941 | -0.270533 | 0.817739 | 0.753074 | ... | -0.009431 | 0.798278 | -0.137458 | 0.141267 | -0.206010 | 0.502292 | 0.219422 | 0.215153 | 69.99 | 0 |
5 rows × 30 columns
任务描述:
根据V1…V28,Amount的数值,预测交易类型class(为1或者0)
pandas转numpy
all_data = np.array(df)
打乱数据
np.random.shuffle(all_data)
获取训练数据
train_data = all_data[:200000,:-1]
train_label = all_data[:200000,-1]
train_data.shape
(200000, 29)
获取测试集数据
test_data = all_data[200000:,:-1]
test_label = all_data[200000:,-1]
搭建模型
class cred_model(tf.keras.Model):
def __init__(self):
super().__init__()
self.dense1 = tf.keras.layers.Dense(
input_shape=(29,),
units=100,
activation=tf.nn.relu,
)
self.dense2 = tf.keras.layers.Dense(
units=1,
activation=tf.nn.sigmoid,
)
def call(self, inputs):
x = self.dense1(inputs)
output = self.dense2(x)
return output
model = cred_model()
model.summary()
Model: "cred_model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) multiple 3000
_________________________________________________________________
dense_3 (Dense) multiple 101
=================================================================
Total params: 3,101
Trainable params: 3,101
Non-trainable params: 0
_________________________________________________________________
配置模型
model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
loss="binary_crossentropy",
metrics=["acc"])
训练模型
model.fit(train_data, train_label, epochs=10, batch_size=100,
validation_data=(test_data, test_label),)
Train on 200000 samples, validate on 84807 samples
Epoch 1/10
200000/200000 [==============================] - 4s 18us/sample - loss: 0.0240 - acc: 0.9988 - val_loss: 0.0056 - val_acc: 0.9994
Epoch 2/10
200000/200000 [==============================] - 3s 16us/sample - loss: 0.0112 - acc: 0.9992 - val_loss: 0.0075 - val_acc: 0.9994
Epoch 3/10
200000/200000 [==============================] - 4s 18us/sample - loss: 0.0091 - acc: 0.9992 - val_loss: 0.0503 - val_acc: 0.9987
Epoch 4/10
200000/200000 [==============================] - 3s 17us/sample - loss: 0.0152 - acc: 0.9993 - val_loss: 0.0097 - val_acc: 0.9993
Epoch 5/10
200000/200000 [==============================] - 3s 16us/sample - loss: 0.0108 - acc: 0.9994 - val_loss: 0.0052 - val_acc: 0.9994
Epoch 6/10
200000/200000 [==============================] - 3s 17us/sample - loss: 0.0064 - acc: 0.9994 - val_loss: 0.0076 - val_acc: 0.9992
Epoch 7/10
200000/200000 [==============================] - 3s 16us/sample - loss: 0.0101 - acc: 0.9994 - val_loss: 0.0077 - val_acc: 0.9994
Epoch 8/10
200000/200000 [==============================] - 3s 16us/sample - loss: 0.0057 - acc: 0.9994 - val_loss: 0.0070 - val_acc: 0.9994
Epoch 9/10
200000/200000 [==============================] - 3s 17us/sample - loss: 0.0075 - acc: 0.9994 - val_loss: 0.0044 - val_acc: 0.9993
Epoch 10/10
200000/200000 [==============================] - 3s 16us/sample - loss: 0.0070 - acc: 0.9994 - val_loss: 0.0053 - val_acc: 0.9994
<tensorflow.python.keras.callbacks.History at 0x16f5b4d4d30>
model.summary()
Model: "cred_model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) multiple 3000
_________________________________________________________________
dense_3 (Dense) multiple 101
=================================================================
Total params: 3,101
Trainable params: 3,101
Non-trainable params: 0
_________________________________________________________________