还没看完,先放上来,这个乱七八糟的草稿笔记在这就能提醒自己抓紧看....
GPT Feature
large transformer-based language model
Training objective: predict the next word, given all of the previous words within some text.
GPT-2在question answering, reading comprehension, summarization, and translation上,尽管表现不好,但是用足够的数据和计算量是可以直接做无监督学习的。
GPT-2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient (unlabeled) data and compute.