How to Reduce Overfitting With Dropout Regularization

A simple and powerful regularization technique for neural networks and deep learning models is dropout. In this lesson you will discover the dropout regularization technique and how to apply it to your models in Python with Keras. After completing this lesson you will know:

  • How the dropout regularization technique works
  • How to use dropout on your input and hidden layers.
  • How to use dropout on your hidden layers.

1.1 Dropout Regularization For Neural Networks

Dropout is a regularization technique for neural network models proposed by Srivastava, et al. in their 2014 paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting1. Dropout is a technique where randomly selected neurons are ignored during training. They are dropped-out randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass.

         As a neural network learns, neuron weights settle into their context within the network. Weights of neurons are tuned for specific features providing some specialization. Neighboring neurons become to rely on this specialization, which if taken too far can result in a fragile model too specialized to the training data. This reliant on context for a neuron during training is referred to as complex co-adaptations. You can imagine that if neurons are randomly dropped out of the network during training, that other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This is believed to result in multiple independent internal representations being learned by the network.

        The effect is that the network becomes less sensitive to the specific weights of neurons. This in turn results in a network that is capable of better generalization and is less likely to overfit the training data.

1.2 Dropout Regularization in Keras

Dropout is easily implemented by randomly selecting nodes to be dropped-out with a given probability (e.g. 20%) each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model. Next we will explore a few di↵erent ways of using Dropout in Keras. The examples will use the Sonar dataset binary classification dataset (learn more in Section 11.1). We will evaluate the developed models using scikit-learn with 10-fold cross validation, in order to better tease out differences in the results. There are 60 input values and a single output value and the input values are standardized before being used in the network. The baseline neural network model has two hidden layers, the first with 60 units and the second with 30. Stochastic gradient descent is used to train the model with a relatively low learning rate and momentum. The full baseline model is listed below.

# Baseline Neural Network For The Sonar Dataset
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm
# from keras.optimizers import SGD
from tensorflow.keras.optimizers import SGD
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# load dataset
dataframe = pd.read_csv("sonar.csv",header=None)
dataset = dataframe.values

# split into input(X) and output(Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# baseline
def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal',activation='relu'))
    model.add(Dense(30, kernel_initializer='normal',activation='relu'))
    model.add(Dense(1, kernel_initializer='normal',activation='sigmoid'))
    # compile model
    sgd = SGD(lr=0.01, momentum=0.8, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy',optimizer=sgd, metrics=['accuracy'])
    return model

np.random.seed(seed)
estimators = []
estimators.append(('standardize',StandardScaler()))
estimators.append(('mlp',KerasClassifier(build_fn=create_baseline,epochs=300,batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running the example for the baseline model without drop-out generates an estimated classification accuracy of 84.62%. 

Baseline: 84.62% (6.45%)

1.3 Using Dropout on the Visible Layer

Dropout can be applied to input neurons called the visible layer. In the example below we add a new Dropout layer between the input (or visible layer) and the first hidden layer. The dropout rate is set to 20%, meaning one in five inputs will be randomly excluded from each update cycle.         

        Additionally, as recommended in the original paper on dropout, a constraint is imposed on the weights for each hidden layer, ensuring that the maximum norm of the weights does not exceed a value of 3. This is done by setting the W constraint argument on the Dense class when constructing the layers. The learning rate was lifted by one order of magnitude and the momentum was increased to 0.9. These increases in the learning rate were also recommended in the original dropout paper. Continuing on from the baseline example above, the code below exercises the same network with input dropout.

# Baseline Neural Network For The Sonar Dataset
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm
from tensorflow.keras.optimizers import SGD
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# load dataset
dataframe = pd.read_csv("sonar.csv",header=None)
dataset = dataframe.values

# split into input(X) and output(Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# baseline

def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal',activation='relu'))
    model.add(Dense(30, kernel_initializer='normal',activation='relu'))
    model.add(Dense(1, kernel_initializer='normal',activation='sigmoid'))
    # Compile model
    sgd = SGD(lr=0.01, momentum=0.8,decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy',optimizer=sgd, metrics=['accuracy'])
    return model

np.random.seed(seed)
estimators = []
estimators.append(('standardize',StandardScaler()))
estimators.append(('mlp',KerasClassifier(build_fn=create_baseline,epochs=300,batch_size=16,verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" %(results.mean()*100, results.std()*100))

Running the example for the baseline model without drop-out generates an estimated classification accuracy of 83.14%.

Baseline: 83.14% (8.08%)

1.3 Using Dropout on the Visible Layer

Dropout can be applied to input neurons called the visible layer. In the example below we add a new Dropout layer between the input (or visible layer) and the first hidden layer. The dropout rate is set to 20%, meaning one in five inputs will be randomly excluded from each update cycle.        

         Additionally, as recommended in the original paper on dropout, a constraint is imposed on the weights for each hidden layer, ensuring that the maximum norm of the weights does not exceed a value of 3. This is done by setting the W constraint argument on the Dense class when constructing the layers. The learning rate was lifted by one order of magnitude and the momentum was increased to 0.9. These increases in the learning rate were also recommended in the original dropout paper. Continuing on from the baseline example above, the code below exercises the same network with input dropout.

# Baseline Neural Network For The Sonar Dataset
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm
from tensorflow.keras.optimizers import SGD
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# load dataset
dataframe = pd.read_csv("sonar.csv",header=None)
dataset = dataframe.values

# split into input(X) and output(Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# baseline

def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal',activation='relu'))
    model.add(Dense(30, kernel_initializer='normal',activation='relu'))
    model.add(Dense(1, kernel_initializer='normal',activation='sigmoid'))
    # Compile model
    sgd = SGD(lr=0.01, momentum=0.8,decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy',optimizer=sgd, metrics=['accuracy'])
    return model

np.random.seed(seed)
estimators = []
estimators.append(('standardize',StandardScaler()))
estimators.append(('mlp',KerasClassifier(build_fn=create_baseline,epochs=300,batch_size=16,verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" %(results.mean()*100, results.std()*100))
Baseline: 83.14% (8.08%)

# Sample Output From Example of Using Dropout on the Visible Layer.

1.4 Using Dropout on Hidden Layers

Dropout can be applied to hidden neurons in the body of your network model. In the example below dropout is applied between the two hidden layers and between the last hidden layer and the output layer. Again a dropout rate of 20% is used as is a weight constraint on those layers.

# Example of Using Dropout on Hidden Layers
# Example of Dropout on the Sonar Dataset: Hidden Layer
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm
from tensorflow.keras.optimizers import SGD
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# load dataset
dataframe = pd.read_csv("sonar.csv",header=None)
dataset = dataframe.values

# split into input(X) and output(Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# dropout in hidden layers with weight constraint

def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal',activation='relu',W_constraint=maxnorm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(30, kernel_initializer='normal',activation='relu',W_constraint=maxnorm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='normal',activation='sigmoid'))
    # Compile model
    sgd = SGD(lr=0.01, momentum=0.8,decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy',optimizer=sgd, metrics=['accuracy'])
    return model

np.random.seed(seed)
estimators = []
estimators.append(('standardize',StandardScaler()))
estimators.append(('mlp',KerasClassifier(build_fn=create_baseline,epochs=300,batch_size=16,verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" %(results.mean()*100, results.std()*100))

We can see that for this problem and for the chosen network configuration that using dropout in the hidden layers did not lift performance. In fact, performance was worse than the baseline. It is possible that additional training epochs are required or that further tuning is required to the learning rate.

Hidden: 83.09% (7.63%)

1.5 Tips For Using Dropout

The original paper on Dropout provides experimental results on a suite of standard machine learning problems. As a result they provide a number of useful heuristics to consider when using dropout in practice:

  • Generally use a small dropout value of 20%-50% of neurons with 20% providing a good starting point. A probability too low has minimal e↵ect and a value too high results in under-learning by the network.
  • Use a larger network. You are likely to get better performance when dropout is used on a larger network, giving the model more of an opportunity to learn independent representations.
  • Use dropout on input (visible) as well as hidden layers. Application of dropout at each layer of the network has shown good results.
  • Use a large learning rate with decay and a large momentum. Increase your learning rate by a factor of 10 to 100 and use a high momentum value of 0.9 or 0.99.
  • Constrain the size of network weights. A large learning rate can result in very large network weights. Imposing a constraint on the size of network weights such as max-norm regularization with a size of 4 or 5 has been shown to improve results.

1.6 Summary

In this lesson you discovered the dropout regularization technique for deep learning models. You learned:

  • What dropout is and how it works.
  • How you can use dropout on your own deep learning models.
  • Tips for getting the best results from dropout on your own models.

1.6.1 Next

Another important technique for improving the performance of your neural network models is to adapt the learning rate during training. In the next lesson you will discover di↵erent learning rate schedules and how you can apply them with Keras to your own problems.

### 回答1: 要写一个 Python 程序来减少过拟合,我们可以采取以下步骤: 1. 选择合适的模型:通常来说,简单的模型比复杂的模型更不容易过拟合。所以我们可以选择一个简单的模型来减少过拟合的风险。 2. 增加训练数据:如果训练数据较少,模型可能会过拟合。所以我们可以尝试增加训练数据来减少过拟合。 3. 使用正则化:正则化是一种方法,用来防止模型过于复杂,从而减少过拟合的风险。常见的正则化方法有 L1 正则化和 L2 正则化。 4. 使用 DropoutDropout 是一种正则化方法,通过随机丢弃某些神经元来防止过拟合。 5. 使用 K 折交叉验证:K 折交叉验证是一种模型评估方法,用来检测模型的泛化能力。我们可以使用 K 折交叉验证来检测模型是否出现过拟合。 以下是一个例子: ``` from sklearn.model_selection import KFold # 定义 KFold 对象 kfold = KFold(n_splits=5, shuffle=True, random_state=1) # 将数据分成 5 份,分别做五次训练和测试 for train_index, test_index in kfold.split(X): X_train ### 回答2: 要编写一个Python程序来减少过拟合,可以使用以下几个方法: 1. 数据集划分:将数据集划分为训练集和测试集。可以使用train_test_split函数分割数据集,确保训练集和测试集的数据是独立的。 2. 交叉验证:使用交叉验证来评估模型的性能。可以使用cross_val_score函数进行交叉验证,将数据集划分为多个子集,每个子集都作为测试集和训练集使用。 3. 正则化:添加正则化项来限制模型的复杂性。可以使用L1正则化或L2正则化来减少过拟合。在模型训练时,通过设置正则化参数来增加正则化项的影响。 4. 提前停止:使用提前停止来防止模型过拟合。可以在每个训练周期结束时,比较训练误差和验证误差,当验证误差开始上升时,停止训练。 5. 增加数据量:增加数据量可以减少过拟合。可以通过收集更多的数据、合成数据或使用数据增强技术来增加数据量。 6. 特征选择:选择最相关的特征来训练模型。可以使用特征选择算法如卡方检验、信息增益等来选择特征。 7. 集成学习:使用集成学习方法如随机森林、梯度提升树等来减少过拟合。这些方法通过组合多个模型,减少单个模型的过拟合风险。 编写程序时,可以根据具体的需求选择上述方法的组合,根据数据集的大小和模型的复杂性来调整参数。注意要对模型进行充分的评估和调优,以找到最佳的减少过拟合的方法。 ### 回答3: 在Python中编写一个减少过拟合的程序可以通过以下几个步骤实现: 1. 数据集划分:将原始数据集划分为训练集和验证集。可以使用Scikit-learn库中的train_test_split函数来实现。训练集用于模型的训练,验证集用于模型的评估和调整。 2. 特征选择:通过选择合适的特征来减少过拟合。可以使用Scikit-learn库中的特征选择方法,如方差阈值、相关性阈值等,来选择有意义的特征。 3. 正则化方法:使用正则化方法来减少模型的复杂度,以防止过拟合。可以通过添加正则化项(如L1、L2正则化)来限制模型的权重大小。 4. Dropout:使用Dropout方法可以在训练过程中随机丢弃一部分神经元,从而减少神经网络的复杂度,缓解过拟合问题。可以使用Keras库中的Dropout层来实现。 5. 交叉验证:使用交叉验证可以更准确地评估模型的性能,并减少过拟合的可能性。可以使用Scikit-learn库中的交叉验证方法,如K折交叉验证,来得到更可靠的模型评估结果。 6. 集成学习:通过使用集成学习方法,如随机森林、梯度提升等,可以减少过拟合风险。集成学习将多个模型的预测结果进行组合,降低了单个模型的过度拟合可能性。 以上是一些常见的减少过拟合的方法。根据具体的问题和数据集,还可以使用其他针对性的方法进行调整。编写Python程序时,可以使用相关的机器学习库,如Scikit-learn、Keras等,以及相应的函数和方法来实现上述步骤。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值