r2-regularization, weight decay
input dropout
mask dropout
weight dropout
DropConnect (Wan et al. 2013) applied on the RNN hidden to hidden matrix
activation regularization(AR)
temporal activation regularization(TAR)
adversarial dropout, fraternal dropout
Fraternal Dropout train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions.