implement OpenAI gpt
papers
Gaussian Error Linear Units
translate to chinese
Attention Is All You Need
translate to chinese
Improving Language Understanding by Generative Pre-Training
translate to chinese
Language Models are Unsupervised Multitask Learners
translate to chinese
Dataset
use the dataset ROCStories
output
my device
31.4 GiB
Intel® Core™ i7-8700K CPU @ 3.70GHz × 12
GeForce GTX 1080 Ti/PCIe/SSE2
64-bit
run three epochs
with 40 seconds an epoch for train and 23 seconds an epoch for eval,
so we need less than 3 minutes to get the results below.
show.py line:42 ***** Eval results *****
show.py line:44 eval_accuracy = 0.874933190807055
show.py line:44 eval_loss = 0.432198545569156
show.py line:44 train_loss = 2.201771383611565
run one epoch
with 40 seconds an epoch for train and 23 seconds an epoch for eval,
so we need about 1 minutes to get the results below.
show.py line:42 ***** Eval results *****
show.py line:44 eval_accuracy = 0.863174772848744
show.py line:44 eval_loss = 0.31887995107815814
show.py line:44 train_loss = 3.087455103540013
run 10 epochs
with 40 seconds an epoch for train and 23 seconds an epoch for eval,
so we need about 7 minutes to get the results below.
show.py line:42 ***** Eval results *****
show.py line:44 eval_accuracy = 0.8786745056119722
show.py line:44 eval_loss = 0.5693538990389142
show.py line:44 train_loss = 1.2477980831749418
run 30 epochs
with 40 seconds an epoch for train and 23 seconds an epoch for eval,
so we need about 21 minutes to get the results below.
show.py line:42 ***** Eval results *****
show.py line:44 eval_accuracy = 0.8727952966328166
show.py line:44 eval_loss = 0.6764590177271101
show.py line:44 train_loss = 0.23714334345780885
run directly
with 23 seconds an epoch for eval,
so we need about half a minutes to get the results below.
show.py line:42 ***** Eval results *****
show.py line:44 eval_accuracy = 0.5611972207375735
show.py line:44 eval_loss = 0.6895335352318919
show.py line:44 train_loss = 0.0
github
https://github.com/darr/gpt