由于OCR是序列到序列,NMT或者通用的任务Transformer也是Sequence 2 Sequence。而OCR识别经典论文是CRNN,其中是CNN+RNN+softmax,这个RNN可以试试LSTM,GRU,或者其他变种。也可以是机器翻译的端到端的序列识别。
本文试图分析Transformer与OCR任务,试图将Transformer替换CRNN中的LSTM
N-Grams
N-grams refer to the process of combining the nearby words together for representation purposes where N represents the number of words to be combined together.
- For eg, consider a sentence, “Natural Language Processing is essential to Computer Science.”
- A 1-gram or unigram model will tokenize the sentence into one word combinations and thus the output will be “Natural, Language, Processing, is, essential, to, Computer, Science”
- A bigram model on the other hand will tokenize it into combination of 2 words each and the output will be “Natural Language, Language Processing, Processing is, is essential, essential to, to Computer, Computer Science”