TF2.0项目实战一:汽车质量检测

1. 获取数据集

car-evaluation下载链接: https://archive.ics.uci.edu/ml/datasets/Car+Evaluation

在下载页面有对数据的详细描述, 参照数据描述, 对数据进行处理.

2. 对数据进行处理

将数据分为7大类要素, 其中包括分类结果Result.

column_names=['Buying', 'Maint', 'Doors', 'Persons', 'Lug_boot', 'Safety', 'Result']
raw_datasets = pd.read_csv('./data/PRACTICE/car.data', names=column_names,
                           na_values='?', sep=',', skipinitialspace=True, skiprows=1)

需要注意的是在这些数据中, 有5列数据是字符串分类数据, 而不是数值, 所以需要将其转化为数值, 才能输入到模型中进行训练.


## 查找是否有未知值-NAN
# print(raw_datasets.isna().sum())
## 简单删除nan行
# raw_datasets = raw_datasets.dropna()

## get_dummies(): Convert categorical variable into indicator variables
price = pd.get_dummies(raw_datasets.Buying, prefix='Buying')
## 其他字符串列, 依次按照这种方式转变.

3. 建立数据转换函数, 将dataframe数据转为tf数据

def dataframe_to_tfdata(dataframe, shuffle=True, batch_size=32):
    c_dataframe = dataframe.copy()
    labels_tmp0 = c_dataframe.pop('Result_acc')
    labels_tmp1 = c_dataframe.pop('Result_good')
    labels_tmp2 = c_dataframe.pop('Result_unacc')
    labels_tmp3 = c_dataframe.pop('Result_vgood')
    labels_tensor = pd.concat([labels_tmp0, labels_tmp1, labels_tmp2, labels_tmp3], axis=1)
    tfdata = tf.data.Dataset.from_tensor_slices((dict(c_dataframe), labels_tensor.values))
    if shuffle:
        tfdata = tfdata.shuffle(buffer_size=len(c_dataframe))
    tfdata = tfdata.batch(batch_size)
    return tfdata

4. 数据分割并添加features特征列

## 特征列越完整, 特征层就越有意义, 训练模型的精度更高.
feature_column_data=[]

for col in ['Buying_high',  'Buying_low',  'Buying_med',  'Buying_vhigh',
            'Maint_high',  'Maint_low',  'Maint_med',  'Maint_vhigh',
            'Doors',  'Persons',  'Lug_boot_big',
            'Lug_boot_med',  'Lug_boot_small',  'Safety_high',  'Safety_low',  'Safety_med']:
    feature_column_data.append(tf.feature_column.numeric_column(col))
    
batch = 16
train_ds = dataframe_to_tfdata(train_data, batch_size=batch)
#train_ds = train_ds.shuffle(10000).repeat().batch(batch)
val_ds = dataframe_to_tfdata(val_data, shuffle=False, batch_size=batch)
test_ds = dataframe_to_tfdata(test_data, shuffle=False, batch_size=batch)

5. 构建模型并开始训练

model = tf.keras.Sequential([
    layers.DenseFeatures(feature_column_data),
    layers.Dense(128, activation='relu'),
    layers.Dense(128, activation='relu'),
    layers.Dense(4, activation='softmax')
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_ds,
          validation_data=val_ds,
         # steps_per_epoch=23,
          epochs=100
         # validation_steps=5
          )
## 这里有两个参数, steps_per_epoch和validation_steps如果不确定,就干脆可以不加, 不会影响训练模型的精度.
loss, accuracy = model.evaluate(test_ds)
print("accuracy=", accuracy)

BaseCollectiveExecutor::StartAbort Out of range: End of sequence, 这个报错很常见, 就是因为几个参数没有调对(batch, epochs, steps_per_epoch, validation_steps),

最后训练的精度最高达到0.9928, 损失为0.0163;
实际测试的精度为: 0.9798

"模型训练小技巧"

## 在模型训练时, 添加下面两句可以在训练精度没有发生较大改变时提前终止:
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,
                    validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])

## 经验总结
1.当数字输入数据特征具有不同范围的值时,应将每个特征独立地缩放到相同范围。
2.如果没有太多训练数据,应选择隐藏层很少的小网络,以避免过拟合。
3.尽早停止是防止过拟合的有效技巧。
  • 0
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
uci的车型数据库 1. Title: Car Evaluation Database 2. Sources: (a) Creator: Marko Bohanec (b) Donors: Marko Bohanec (marko.bohanec@ijs.si) Blaz Zupan (blaz.zupan@ijs.si) (c) Date: June, 1997 3. Past Usage: The hierarchical decision model, from which this dataset is derived, was first presented in M. Bohanec and V. Rajkovic: Knowledge acquisition and explanation for multi-attribute decision making. In 8th Intl Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988. Within machine-learning, this dataset was used for the evaluation of HINT (Hierarchy INduction Tool), which was proved to be able to completely reconstruct the original hierarchical model. This, together with a comparison with C4.5, is presented in B. Zupan, M. Bohanec, I. Bratko, J. Demsar: Machine learning by function decomposition. ICML-97, Nashville, TN. 1997 (to appear) 4. Relevant Information Paragraph: Car Evaluation Database was derived from a simple hierarchical decision model originally developed for the demonstration of DEX (M. Bohanec, V. Rajkovic: Expert system for decision making. Sistemica 1(1), pp. 145-157, 1990.). The model evaluates cars according to the following concept structure: CAR car acceptability . PRICE overall price . . buying buying price . . maint price of the maintenance . TECH technical characteristics . . COMFORT comfort . . . doors number of doors . . . persons capacity in terms of persons to carry . . . lug_boot the size of luggage boot . . safety estimated safety of the car Input attributes are printed in lowercase. Besides the target concept (CAR), the model includes three intermediate concepts: PRICE, TECH, COMFORT. Every concept is in the original model related to its lower level descendants by a set of examples (for these examples sets see http://www-ai.ijs.si/BlazZupan/car.html). The Car Evaluation Database contains examples with the structural information removed, i.e., directly relates CAR to the six input attributes: buying, maint, doors, persons, lug_boot, safety. Because of known underlying concept structure, this database may be particularly useful for testing constructive induction and structure discovery methods. 5. Number of Instances: 1728 (instances completely cover the attribute space) 6. Number of Attributes: 6 7. Attribute Values: buying v-high, high, med, low maint v-high, high, med, low doors 2, 3, 4, 5-more persons 2, 4, more lug_boot small, med, big safety low, med, high 8. Missing Attribute Values: none 9. Class Distribution (number of instances per class) class N N[%] ----------------------------- unacc 1210 (70.023 %) acc 384 (22.222 %) good 69 ( 3.993 %) v-good 65 ( 3.762 %)
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱发呆de白菜头

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值