每隔一段时间在NLP领域都有个新闻,xx模型全面超越bert,今天也不例外,今天刷屏的是xlnet网络,在bert的基础上做了修改,模型网络是24层,模型大小是中文的bert的4倍左右,看下怎么玩,其中英文分词这里原代码中用的是 sentencepiece,所以在使用时要安装这个包,下面一起来看看怎么使用:首先是要下载模型用于
curl -O "https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip"
下载模型之后大概在1.2g左右,其中包括模型的配置xlnet_config.json、英文分词模型spiece.model以及xlnet的checkpoint文件,三个和中文的bert模型类似,然后下载英文用于文本分类的语料:
curl -O "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
解压里面有train文件下面含有neg和pos两个文件用于在英文文本二分类,然后下载xlnet源码:
git clone https://github.com/zihangdai/xlnet.git
下载源码在需要对run_classifier.py做一些配置,命令行如下:
python run_classifier.py \
--use_tpu=False \
--tpu="" \
--do_train=True \
--do_eval=False \
--eval_all_ckpt=False \
--task_name=imdb \
--data_dir=/Users/shuubiasahi/Downloads/aclImdb \
--output_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb \
--model_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb \
--uncased=False \
--spiece_model_file=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model \
--model_config_path=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json \
--init_checkpoint=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
--max_seq_length=512 \
--train_batch_size=32 \
--eval_batch_size=8 \
--num_hosts=1 \
--num_core_per_host=8 \
--learning_rate=2e-5 \
--train_steps=4000 \
--warmup_steps=500 \
--save_steps=500 \
--iterations=500
直接在ide里面如下:
# Model
flags.DEFINE_string("model_config_path", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json",
help="Model config path.")
flags.DEFINE_float("dropout", default=0.1,
help="Dropout rate.")
flags.DEFINE_float("dropatt", default=0.1,
help="Attention dropout rate.")
flags.DEFINE_integer("clamp_len", default=-1,
help="Clamp length")
flags.DEFINE_string("summary_type", default="last",
help="Method used to summarize a sequence into a compact vector.")
flags.DEFINE_bool("use_summ_proj", default=True,
help="Whether to use projection for summarizing sequences.")
flags.DEFINE_bool("use_bfloat16", False,
help="Whether to use bfloat16.")
# Parameter initialization
flags.DEFINE_enum("init", default="normal",
enum_values=["normal", "uniform"],
help="Initialization method.")
flags.DEFINE_float("init_std", default=0.02,
help="Initialization std when init is normal.")
flags.DEFINE_float("init_range", default=0.1,
help="Initialization std when init is uniform.")
# I/O paths
flags.DEFINE_bool("overwrite_data", default=False,
help="If False, will use cached data if available.")
flags.DEFINE_string("init_checkpoint", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt",
help="checkpoint path for initializing the model. "
"Could be a pretrained model or a finetuned model.")
flags.DEFINE_string("output_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb",
help="Output dir for TF records.")
flags.DEFINE_string("spiece_model_file", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model",
help="Sentence Piece model path.")
flags.DEFINE_string("model_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb",
help="Directory for saving the finetuned model.")
flags.DEFINE_string("data_dir", default="/Users/shuubiasahi/Downloads/aclImdb",
help="Directory for input data.")
# TPUs and machines
flags.DEFINE_bool("use_tpu", default=False, help="whether to use TPU.")
flags.DEFINE_integer("num_hosts", default=1, help="How many TPU hosts.")
flags.DEFINE_integer("num_core_per_host", default=8,
help="8 for TPU v2 and v3-8, 16 for larger TPU v3 pod. In the context "
"of GPU training, it refers to the number of GPUs used.")
flags.DEFINE_string("tpu_job_name", default=None, help="TPU worker job name.")
flags.DEFINE_string("tpu", default=None, help="TPU name.")
flags.DEFINE_string("tpu_zone", default=None, help="TPU zone.")
flags.DEFINE_string("gcp_project", default=None, help="gcp project.")
flags.DEFINE_string("master", default=None, help="master")
flags.DEFINE_integer("iterations", default=1000,
help="number of iterations per TPU training loop.")
# training
flags.DEFINE_bool("do_train", default=True, help="whether to do training")
flags.DEFINE_integer("train_steps", default=10000,
help="Number of training steps")
flags.DEFINE_integer("warmup_steps", default=0, help="number of warmup steps")
flags.DEFINE_float("learning_rate", default=1e-5, help="initial learning rate")
flags.DEFINE_float("lr_layer_decay_rate", 1.0,
"Top layer: lr[L] = FLAGS.learning_rate."
"Low layer: lr[l-1] = lr[l] * lr_layer_decay_rate.")
flags.DEFINE_float("min_lr_ratio", default=0.0,
help="min lr ratio for cos decay.")
flags.DEFINE_float("clip", default=1.0, help="Gradient clipping")
flags.DEFINE_integer("max_save", default=0,
help="Max number of checkpoints to save. Use 0 to save all.")
flags.DEFINE_integer("save_steps", default=100,
help="Save the model for every save_steps. "
"If None, not to save any model.")
flags.DEFINE_integer("train_batch_size", default=8,
help="Batch size for training")
flags.DEFINE_float("weight_decay", default=0.00, help="Weight decay rate")
flags.DEFINE_float("adam_epsilon", default=1e-8, help="Adam epsilon")
flags.DEFINE_string("decay_method", default="poly", help="poly or cos")
# evaluation
flags.DEFINE_bool("do_eval", default=False, help="whether to do eval")
flags.DEFINE_bool("do_predict", default=False, help="whether to do prediction")
flags.DEFINE_float("predict_threshold", default=0,
help="Threshold for binary prediction.")
flags.DEFINE_string("eval_split", default="dev", help="could be dev or test")
flags.DEFINE_integer("eval_batch_size", default=128,
help="batch size for evaluation")
flags.DEFINE_integer("predict_batch_size", default=128,
help="batch size for prediction.")
flags.DEFINE_string("predict_dir", default=None,
help="Dir for saving prediction files.")
flags.DEFINE_bool("eval_all_ckpt", default=False,
help="Eval all ckpts. If False, only evaluate the last one.")
flags.DEFINE_string("predict_ckpt", default=None,
help="Ckpt path for do_predict. If None, use the last one.")
# task specific
flags.DEFINE_string("task_name", default="imdb", help="Task name")
flags.DEFINE_integer("max_seq_length", default=128, help="Max sequence length")
flags.DEFINE_integer("shuffle_buffer", default=2048,
help="Buffer size used for shuffle.")
flags.DEFINE_integer("num_passes", default=1,
help="Num passes for processing training data. "
"This is use to batch data without loss for TPUs.")
flags.DEFINE_bool("uncased", default=False,
help="Use uncased.")
flags.DEFINE_string("cls_scope", default=None,
help="Classifier layer scope.")
flags.DEFINE_bool("is_regression", default=False,
help="Whether it's a regression task.")
FLAGS = flags.FLAGS
要配置的主要是模型位置,配置文件位置,自己重写写简单,和bert大同小异,还有一些可能错误的结果把代码中的
run_config=None就可以,毕竟没有TPU
运行如下: