base attention
dot attention
mlp attention
multihead attention
no attention
pooling attention
https://github.com/pytorch/translate/tree/master/pytorch_translate/attention
attention
bilinear attention
cosine attention
dot product attention
legacy attention
linear attention
https://github.com/allenai/allennlp/tree/master/allennlp/modules/attention
intra sentence attention
multi head self attention
stacked self attention