【机器学习速成课程】学习总结（下）

最新推荐文章于 2022-04-07 09:21:13 发布

leSerein_

最新推荐文章于 2022-04-07 09:21:13 发布

阅读量281

点赞数

文章标签：机器学习神经网络人工智能

本文链接：https://blog.csdn.net/a_piece_of_ppx/article/details/112003544

版权

机器学习速成课程(Machine Learning Crash Course with TensorFlow APIs)
https://developers.google.cn/machine-learning/crash-course.

十三、Regularization for Sparsity

进行了L1正则化的介绍

稀疏矢量通常包含许多维度。创建特征组合会导致包含更多维度。由于使用此类高维度特征矢量，因此模型可能会非常庞大，并且需要大量的 RAM。

在高维度稀疏矢量中，最好尽可能使权重正好降至 0。 正好为 0 的权重基本上会使相应特征从模型中移除。将特征设为 0 可节省 RAM 空间，且可以减少模型中的噪点。

使用 L1 正则化使模型中很多信息缺乏的系数正好为 0，从而在推理时节省 RAM。L2 正则化可以使权重变小，但是并不能使它们正好为 0.0。

在这里插入图片描述

上述三张图展示了L1正则化降低学习到的权重，减小训练损失和测试损失之间的差，避免过拟合的问题。同时需注意正则化率过高会导致模型无法学习到任何东西。

十四、Neural Networks

1. 基础结构

“神经网络”的所有标准组件：

一组节点，类似于神经元，位于层中。
一组权重，表示每个神经网络层与其下方的层之间的关系。
神经网络层，也可能是其他类型的层。
一组偏差，每个节点一个偏差。
一个激活函数，对层中每个节点的输出进行转换。

其结构如下：
在这里插入图片描述
使用3层隐藏层的神经网络的效果展示

其中初始化对非凸优化非常重要，同样的模型在不同的初始化值下的结果可能相差甚远，如现在的测试损失有0.17，但有些初始化可能会呈如下所示：

需要选择合适的初始化值

2. 对应代码

下列代码进行了神经网络的练习

train_df = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv")
train_df = train_df.reindex(np.random.permutation(train_df.index)) # shuffle the examples
test_df = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_test.csv")
# Create an empty list that will eventually hold all created feature columns.
feature_columns = []

# We scaled all the columns, including latitude and longitude, into their
# Z scores. So, instead of picking a resolution in degrees, we're going
# to use resolution_in_Zs.  A resolution_in_Zs of 1 corresponds to 
# a full standard deviation. 
resolution_in_Zs = 0.3  # 3/10 of a standard deviation.

# Create a bucket feature column for latitude.
latitude_as_a_numeric_column = tf.feature_column.numeric_column("latitude")
latitude_boundaries = list(np.arange(int(min(train_df_norm['latitude'])), 
                                     int(max(train_df_norm['latitude'])), 
                                     resolution_in_Zs))
latitude = tf.feature_column.bucketized_column(latitude_as_a_numeric_column, latitude_boundaries)

# Create a bucket feature column for longitude.
longitude_as_a_numeric_column = tf.feature_column.numeric_column("longitude")
longitude_boundaries = list(np.arange(int(min(train_df_norm['longitude'])), 
                                      int(max(train_df_norm['longitude'])), 
                                      resolution_in_Zs))
longitude = tf.feature_column.bucketized_column(longitude_as_a_numeric_column, 
                                                longitude_boundaries)

# Create a feature cross of latitude and longitude.
latitude_x_longitude = tf.feature_column.crossed_column([latitude, longitude], hash_bucket_size=100)
crossed_feature = tf.feature_column.indicator_column(latitude_x_longitude)
feature_columns.append(crossed_feature)  

# Represent median_income as a floating-point value.
median_income = tf.feature_column.numeric_column("median_income")
feature_columns.append(median_income)

# Represent population as a floating-point value.
population = tf.feature_column.numeric_column("population")
feature_columns.append(population)

# Convert the list of feature columns into a layer that will later be fed into
# the model. 
my_feature_layer = tf.keras.layers.DenseFeatures(feature_columns)

# Build a linear regression model as a baseline
# The following variables are the hyperparameters.
learning_rate = 0.01
epochs = 15
batch_size = 1000
label_name = "median_house_value"

# Establish the model's topography.
my_model = create_model(learning_rate, my_feature_layer)

# Train the model on the normalized training set.
epochs, mse = train_model(my_model, train_df_norm, epochs, batch_size, label_name)
plot_the_loss_curve(epochs, mse)

test_features = {name:np.array(value) for name, value in test_df_norm.items()}
test_label = np.array(test_features.pop(label_name)) # isolate the label
print("\n Evaluate the linear regression model against the test set:")
my_model.evaluate(x = test_features, y = test_label, batch_size=batch_size)

def create_model(my_learning_rate, my_feature_layer):
  """Create and compile a simple linear regression model."""
  # Most simple tf.keras models are sequential.
  model = tf.keras.models.Sequential()

  # Add the layer containing the feature columns to the model.
  model.add(my_feature_layer)

  # Describe the topography of the model by calling the tf.keras.layers.Dense
  # method once for each layer. We've specified the following arguments:
  #   * units specifies the number of nodes in this layer.
  #   * activation specifies the activation function (Rectified Linear Unit).
  #   * name is just a string that can be useful when debugging.

  # Define the first hidden layer with 20 nodes.   
  model.add(tf.keras.layers.Dense(units=20, 
                                  activation='relu', 
                                  name='Hidden1'))
  
  # Define the second hidden layer with 12 nodes. 
  model.add(tf.keras.layers.Dense(units=12, 
                                  activation='relu', 
                                  name='Hidden2'))
  
  # Define the output layer.
  model.add(tf.keras.layers.Dense(units=1,  
                                  name='Output'))                              
  
  model.compile(optimizer=tf.keras.optimizers.Adam(lr=my_learning_rate),
                loss="mean_squared_error",
                metrics=[tf.keras.metrics.MeanSquaredError()])

  return model
def train_model(model, dataset, epochs, label_name,
                batch_size=None):
  """Train the model by feeding it data."""

  # Split the dataset into features and label.
  features = {name:np.array(value) for name, value in dataset.items()}
  label = np.array(features.pop(label_name))
  history = model.fit(x=features, y=label, batch_size=batch_size,
                      epochs=epochs, shuffle=True) 

  # The list of epochs is stored separately from the rest of history.
  epochs = history.epoch
  
  # To track the progression of training, gather a snapshot
  # of the model's mean squared error at each epoch. 
  hist = pd.DataFrame(history.history)
  mse = hist["mean_squared_error"]

  return epochs, mse
# The following variables are the hyperparameters.
learning_rate = 0.01
epochs = 20
batch_size = 1000

# Specify the label
label_name = "median_house_value"

# Establish the model's topography.
my_model = create_model(learning_rate, my_feature_layer)

# Train the model on the normalized training set. We're passing the entire
# normalized training set, but the model will only use the features
# defined by the feature_layer.
epochs, mse = train_model(my_model, train_df_norm, epochs, 
                          label_name, batch_size)
plot_the_loss_curve(epochs, mse)

# After building a model against the training set, test that model
# against the test set.
test_features = {name:np.array(value) for name, value in test_df_norm.items()}
test_label = np.array(test_features.pop(label_name)) # isolate the label
print("\n Evaluate the new model against the test set:")
my_model.evaluate(x = test_features, y = test_label, batch_size=batch_size)

线性模型与神经网络模型的结果对比：

线性模型：
3/3 [==============================] - 0s 5ms/step - loss: 0.3951 - mean_squared_error: 0.3951
在这里插入图片描述

神经网络：
3/3 [==============================] - 0s 6ms/step - loss: 0.3628 - mean_squared_error: 0.3628
在这里插入图片描述
后续也可尝试为其添加正则化

十五、Training Neural Networks

很多常见情况都会导致反向传播算法出错。

1. 梯度消失

较低层（更接近输入）的梯度可能会变得非常小。在深度网络中，计算这些梯度时，可能涉及许多小项的乘积。

当较低层的梯度逐渐消失到 0 时，这些层的训练速度会非常缓慢，甚至不再训练。

ReLU 激活函数有助于防止梯度消失。

2. 梯度爆炸

如果网络中的权重过大，则较低层的梯度会涉及许多大项的乘积。在这种情况下，梯度就会爆炸：梯度过大导致难以收敛。

批标准化可以降低学习速率，因而有助于防止梯度爆炸。

3. ReLU 单元消失

一旦 ReLU 单元的加权和低于 0，ReLU 单元就可能会停滞。它会输出对网络输出没有任何贡献的 0 激活，而梯度在反向传播算法期间将无法再从中流过。由于梯度的来源被切断，ReLU 的输入可能无法作出足够的改变来使加权和恢复到 0 以上。

降低学习速率有助于防止 ReLU 单元消失。

4. 丢弃正则化

这是称为丢弃的另一种形式的正则化，可用于神经网络。其工作原理是，在梯度下降法的每一步中随机丢弃一些网络单元。丢弃得越多，正则化效果就越强：

0.0 = 无丢弃正则化。
1.0 = 丢弃所有内容。模型学不到任何规律。
0.0 和 1.0 之间的值更有用。

十六、Multi-Class Neural Networks

我们可以借助深度神经网络（在该网络中，每个输出节点表示一个不同的类别）创建明显更加高效的一对多模型。下图展示了这种方法：
在这里插入图片描述
以下是神经网络在MNIST数据集上的应用：

(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()
#@title Double-click to see a solution to Task 1. 

x_train_normalized = x_train / 255.0
x_test_normalized = x_test / 255.0
print(x_train_normalized[2900][12]) # Output a normalized row

def create_model(my_learning_rate):
  """Create and compile a deep neural net."""
  
  # All models in this course are sequential.
  model = tf.keras.models.Sequential()

  # The features are stored in a two-dimensional 28X28 array. 
  # Flatten that two-dimensional array into a a one-dimensional 
  # 784-element array.
  model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))

  # Define the first hidden layer.   
  model.add(tf.keras.layers.Dense(units=32, activation='relu'))
  
  # Define a dropout regularization layer. 
  model.add(tf.keras.layers.Dropout(rate=0.2))

  # Define the output layer. The units parameter is set to 10 because
  # the model must choose among 10 possible output values (representing
  # the digits from 0 to 9, inclusive).
  #
  # Don't change this layer.
  model.add(tf.keras.layers.Dense(units=10, activation='softmax'))     
                           
  # Construct the layers into a model that TensorFlow can execute.  
  # Notice that the loss function for multi-class classification
  # is different than the loss function for binary classification.  
  model.compile(optimizer=tf.keras.optimizers.Adam(lr=my_learning_rate),
                loss="sparse_categorical_crossentropy",
                metrics=['accuracy'])
  
  return model    


def train_model(model, train_features, train_label, epochs,
                batch_size=None, validation_split=0.1):
  """Train the model by feeding it data."""

  history = model.fit(x=train_features, y=train_label, batch_size=batch_size,
                      epochs=epochs, shuffle=True, 
                      validation_split=validation_split)
 
  # To track the progression of training, gather a snapshot
  # of the model's metrics at each epoch. 
  epochs = history.epoch
  hist = pd.DataFrame(history.history)

  return epochs, hist    
# The following variables are the hyperparameters.
learning_rate = 0.003
epochs = 50
batch_size = 4000
validation_split = 0.2

# Establish the model's topography.
my_model = create_model(learning_rate)

# Train the model on the normalized training set.
epochs, hist = train_model(my_model, x_train_normalized, y_train, 
                           epochs, batch_size, validation_split)

# Plot a graph of the metric vs. epochs.
list_of_metrics_to_plot = ['accuracy']
plot_curve(epochs, hist, list_of_metrics_to_plot)

# Evaluate against the test set.
print("\n Evaluate the new model against the test set:")
my_model.evaluate(x=x_test_normalized, y=y_test, batch_size=batch_size)

结果如下：
3/3 [==============================] - 0s 7ms/step - loss: 0.1406 - accuracy: 0.9574
在这里插入图片描述
通过实验第一层使用256个神经元、第二层使用128个、dropout正则化率改为0.2，可到达98.1%的精度

def create_model(my_learning_rate):
  """Create and compile a deep neural net."""
  
  # All models in this course are sequential.
  model = tf.keras.models.Sequential()

  # The features are stored in a two-dimensional 28X28 array. 
  # Flatten that two-dimensional array into a a one-dimensional 
  # 784-element array.
  model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))

  # Define the first hidden layer.   
  model.add(tf.keras.layers.Dense(units=256, activation='relu'))
  model.add(tf.keras.layers.Dense(units=128, activation='relu'))
  # Define a dropout regularization layer. 
  model.add(tf.keras.layers.Dropout(rate=0.2))

  # Define the output layer. The units parameter is set to 10 because
  # the model must choose among 10 possible output values (representing
  # the digits from 0 to 9, inclusive).
  #
  # Don't change this layer.
  model.add(tf.keras.layers.Dense(units=10, activation='softmax'))     
                           
  # Construct the layers into a model that TensorFlow can execute.  
  # Notice that the loss function for multi-class classification
  # is different than the loss function for binary classification.  
  model.compile(optimizer=tf.keras.optimizers.Adam(lr=my_learning_rate),
                loss="sparse_categorical_crossentropy",
                metrics=['accuracy'])
  
  return model

3/3 [==============================] - 0s 28ms/step - loss: 0.0900 - accuracy: 0.9810
在这里插入图片描述

十七、Embeddings

嵌套是一种相对低维的空间，您可以将高维矢量映射到这种低维空间里。通过使用嵌套，可以让在大型输入（比如代表字词的稀疏矢量）上进行机器学习变得更加容易。

1.协同过滤

协同过滤是一项可以预测用户兴趣（根据很多其他用户的兴趣）的任务。以影片推荐的任务为例，我们的目标是向用户推荐影片。

在二维空间中排列影片：
在这里插入图片描述
可以通过点之间的距离，在预测电影的相似性。（事实上二维也无法捕获用户的所有信息，我们通常使用更高维）

输入表示法：分类数据的最高效表示方式是使用稀疏张量（一种含有极少非零元素的张量）。例如，如果要构建一个影片推荐模型，可以为每部可能的影片分别分配一个唯一的 ID，然后只记录用户已观看影片，如下图所示：
在这里插入图片描述
最后一行对应于稀疏张量 [1, 3, 999999]

2. 获取嵌套

可以将嵌套作为目标任务的神经网络的一部分进行学习。通过这个方法，可以为自己的特定系统量身定制嵌套，不过耗费的时间可能要比单独训练嵌套的时间长。

一般来说，当具有稀疏数据（或您想要嵌套的密集数据）时，就可以创建一个嵌套单元，这个嵌套单元其实是大小为 d 的一个特殊类型的隐藏单元。此嵌套层可与任何其他特征和隐藏层组合。和任何 DNN 中一样，最终层将是要进行优化的损失函数。例如，假设我们正在执行协同过滤，目标是根据其他用户的兴趣预测某位用户的兴趣。我们可以将这个问题作为监督式学习问题进行建模，具体做法是随机选取（或留出）用户观看过的一小部分影片作为正类别标签，然后再优化 Softmax 损失。

在这里插入图片描述
上述为根据协同过滤数据学习影片嵌套的 DNN 架构示例

十八、ML Engineering

除了实现机器学习算法之外，机器学习还包含许多其他内容。生产环境机器学习系统包含大量组件。

1. 静态训练与动态训练

从广义上讲，训练模型的方式有两种：

静态模型采用离线训练方式。也就是说，我们只训练模型一次，然后使用训练后的模型一段时间。
动态模型采用在线训练方式。也就是说，数据会不断进入系统，我们通过不断地更新系统将这些数据整合到模型中

2. 静态推理与动态推理

可以选择以下任一推理策略：

离线推理，指的是使用 MapReduce 或类似方法批量进行所有可能的预测。然后，将预测记录到 SSTable 或 Bigtable 中，并将它们提供给一个缓存/查询表。
在线推理，指的是使用服务器根据需要进行预测。

3. 数据依赖关系

生产机器学习系统中的数据依赖关系。如输入数据的各种问题：可靠性、版本问题等。

4. 公平性

留意机器学习算法可能会意外引入的常见人为偏差。在训练模型之前主动探索数据以确定偏差来源，评估模型预测结果是否存在偏差

评估机器学习模型时需要做的不仅仅是计算损失指标。务必要审核训练数据并评估预测结果是否存在偏差，然后再将模型部署到生产环境中。

本单元介绍了训练数据中可能会出现的不同类型的人为偏差，并提供了有关如何识别偏差并评估其影响的策略。

十九、ML Systems in the Real World

Cancer Prediction（介绍了“模型泄露”，即一些训练标签泄露到特征中，使模型不可靠，这种情况应尽可能避免）
Literature（在测试数据上的预测表现非常好，准确高过高，这是因为按语句划分了数据，如果按照作者来划分，则难以到达这么理想的结果。在随机化处理训练数据和测试数据以进行拆分时，考虑使用哪种方式还是至关重要的。）
Guidelines（介绍了一些有效的机器学习准则）

二十、Conclusion

给了一些机器学习后续的一些实践课程

个人总结：课程总体上讲最大的优势就是直观容易上手，整个课程中极少涉及数学公式，并采用playground形式展示各个超参数对模型效果的影响，登录谷歌账号可以在Colab平台上直接进行编程，无需本地配置环境，对新手很友好。课程时间不长，学习起来应该是高效的。但难免存在一些不足：对机器学习中监督和非监督模型没有介绍，偏重工程上的实现，但这些理论知识恰恰是深入理解机器学习的重要地方。一些部分如正则化、神经网络的反向传播等也没有数学上的解释，理解起来只能停留在表面。课程学习后可以算是入门，但若想继续了解机器学习的原理，仍需学习其他材料。作为速成课程，我觉得已经很好了。

课程地址
https://developers.google.cn/machine-learning/crash-course.