Hands-On Machine Learning with Scikit-Learn & TensorFlow Exercise Q&A Chapter03

最新推荐文章于 2021-11-17 15:13:13 发布

Leonardo Liu

最新推荐文章于 2021-11-17 15:13:13 发布

阅读量965

点赞数

分类专栏： Scikit-Learn Python 机器学习 Hands-On ML with sklearn & TensorFlow Exercise Q&A 文章标签： Machine Learning HandsOn Exercise

本文链接：https://blog.csdn.net/leowinbow/article/details/88624003

版权

本章涉及机器学习实战的习题解答，包括使用KNeighborsClassifier对MNIST数据集进行分类，实现图像平移的数据增强，解决Titanic生存预测问题，以及构建垃圾邮件分类器。通过GridSearchCV寻找最佳参数，提高模型准确率，并探讨了数据扩增和特征选择的重要性。

摘要由CSDN通过智能技术生成

Q1. Try to build a classifier for the MNIST dataset that achieves over 97% accuracy on the test set. Hint: the KNeighborsClassifier works quite well for this task; you just need to find good hyperparameter values (try a grid search on the weights and n_neighbors hyperparameters).

A1:

Firstly, use GridSearchCV:

from sklearn.model_selection import GridSearchCV

param_grid = [{'weights': ["uniform", "distance"], 'n_neighbors': [3, 4, 5]}]

knn_clf = KNeighborsClassifier()
grid_search = GridSearchCV(knn_clf, param_grid, cv=5, verbose=3, n_jobs=-1)
grid_search.fit(X_train, y_train)

Then we can get the best parameters and the best result:

grid_search.best_params_
grid_search.best_score_

from sklearn.metrics import accuracy_score

y_pred = grid_search.predict(X_test)
accuracy_score(y_test, y_pred)

We can see the result:

Q2. Write a function that can shift an MNIST image in any direction (left, right, up, or down) by one pixel. Then, for each image in the training set, create four shifted copies (one per direction) and add them to the training set. Finally, train your best model on this expanded training set and measure its accuracy on the test set. You should observe that your model performs even better now! This technique of artificially growing the training set is called data augmentation or training set expasion.

A2:

Firstly, we need to get each pixel of image, and shift it. We can use the shift function in scipy.ndimage.interpolation module.

from scipy.ndimage.interpolation import shift

def shift_image(image, dx, dy):
    image = image.reshape((28, 28))
    shifted_image = shift(image, [dy, dx], cval=0, mode="constant")
    return shifted_image.reshape([-1])

Then, we randomly choose one training image for a demo, see what will happen:

image = X_train[1000]
shifted_image_down = shift_image(image, 0, 5)
shifted_image_left = shift_image(image, -5, 0)

plt.figure(figsize=(12,3))
plt.subplot(131)
plt.title("Original", fontsize=14)
plt.imshow(image.reshape(28, 28), interpolation="nearest", cmap="Greys")
plt.subplot(132)
plt.title("Shifted down", fontsize=14)
plt.imshow(shifted_image_down.reshape(28, 28), interpolation="nearest", cmap="Greys")
plt.subplot(133)
plt.title("Shifted left", fontsize=14)
plt.imshow(shifted_image_left.reshape(28, 28), interpolation="nearest", cmap="Greys")
plt.show()

After that, we can be sure that our solution is correct, then we can use this solution to shift all our training images.

X_train_augmented = [image for image in X_train]
y_train_augmented = [label for label in y_train]

for dx, dy in ((1, 0), (-1, 0), (0, 1), (0, -1)):
    for image, label in zip(X_train, y_train):
        X_train_augmented.append(shift_image(image, dx, dy))
        y_train_augmented.append(label)

X_train_augmented = np.array(X_train_augmented)
y_train_augmented = np.array(y_train_augmented)

Then we get the shifted set named X_train_augmented and y_train_augmented.