1 Introduction
I build a basic bi-lstm model, bi-lstm + attention model and a multy task learning model describe in the paper and a model which Incorporates glosses information using memory network to encode the relationship between the input sentence and potential glosses information for a certain target words. Besides that I try to Incorporate word sense embedding for gloss input and elmo embedding for the input data into GAS Model. And I hope to train the GAS in a sequence to sequence way instead of distinguish a target word each time. Intuitively when you know the sense of one target word, it helps you to determine the sense of other target words in one sentence.
2 preprocessing
2.1 how to organize the input data
2.1.1 sequence to sequence learning
For the bi-lstm model, in o