ernie文本分类
“Sir Rabindranath Tagore wrote Chokher Bali in 1903 and the Indian National anthem in 1911.”
“泰宾爵士(Rabindranath Tagore)于1903年创作了Chokher Bali,并于1911年创作了印度国歌。”
This simple sentence has some useful information that is hidden. The implicit information is that Rabindranath Tagore was a poet and a writer. Pre-trained language models have exceptional capabilities to obtain state-of-the-art results on various NLP applications such as named entity recognition, question-answering, text classification, etc. However, if you ask language models such as BERT “Is Sir Rabindranath Tagore a poet or a writer?”, the answer might be a tangled tale. A pre-trained language model such as BERT performs poorly in language understanding and hence will not help to capture hidden relations such as “poet” or “author.”
这个简单的句子隐藏了一些有用的信息。 隐含的信息是拉宾德拉纳特·泰戈尔(Rabindranath Tagore)是一位诗人和作家。 预先训练的语言模型有特殊的能力,以获得各种NLP应用,如命名实体识别,问题回答,文本分类等国家的先进成果。然而,如果你问的语言模型,如BERT“ 是爵士泰宾(Rabindranath Tagore)是诗人还是作家?” ,答案可能是一个纠结的故事。 像BERT这样的经过预先训练的语言模型在语言理解方面表现不佳,因此将无助于捕捉诸如“诗人”或“作者”之类的隐藏关系。
To address this, researchers from Tsinghua University and Huawei Noah’s Ark Lab recently proposed a new model that incorporates Knowledge Graphs (KG) into training on large-scale corpora for language representation named “Enhanced Language RepresentatioN with Informative Entities (ERNIE)”[1]
为了解决这个问题,从清华大学和华为诺亚方舟实验室的研究人员最近提出,结合知识图(KG)插入大规模语料库培训名为“E nhanced语言表示摇奖新型号L anguage [R epresentatioN与我 nformativeËntities( )” [1]
How ERNIE achieves it:
ERNIE如何实现的:
Rich knowledge information in text can lead to better language understanding and accordingly benefits various knowledge-driven applications. To achieve this, ERNIE tackled two main challenges to incorporate external knowledge into language representation: Structured Knowledge Encoding, and Heterogeneous Information Fusion.