Owing to only some of the entries in embedding layers updating during each batch training, it is implerative that we can just store the updating entries in the memory and outsourced the left data to other places.
Project_1 Replace the Embedding layer in Transformer or BERT
Owing to only some of the entries in embedding layers updating during each batch training, it is implerative that we can just store the updating entries in the memory and outsourced the left data to other places.