与google的模型文件对比可以发现它们的参数与模型结构有些许差别,比如:
albert_zh里的albert_large_zh模型参数:
{'bert/embeddings/word_embeddings': [21128, 128],
'bert/embeddings/word_embeddings_2': [128, 1024],
'bert/embeddings/token_type_embeddings': [2, 1024],
'bert/embeddings/position_embeddings': [512, 1024],
'bert/embeddings/LayerNorm/beta': [1024],
'bert/embeddings/LayerNorm/gamma': [1024],
'bert/encoder/layer_shared/attention/self/query/kernel': [1024, 1024],
'bert/encoder/layer_shared/attention/self/query/bias': [1024],
'bert/encoder/layer_shared/attention/self/key/kernel': [1024, 1024],
'bert/encoder/layer_shared/attention/self/key/bias': [1024],
'bert/encoder/layer_shared/attention/self/value/kernel': [1024, 1024],
'bert/encoder/layer_shared/attention/self