xlnet模型微调---英文文本分类

本文聚焦NLP领域的XLNet网络，它在Bert基础上修改，模型网络24层，大小约为中文Bert的4倍。介绍了其使用方法，包括下载约1.2g的模型（含配置、分词模型、checkpoint文件）、英文文本分类语料，下载源码并对run_classifier.py做配置等。

每隔一段时间在NLP领域都有个新闻，xx模型全面超越bert，今天也不例外，今天刷屏的是xlnet网络，在bert的基础上做了修改，模型网络是24层，模型大小是中文的bert的4倍左右，看下怎么玩，其中英文分词这里原代码中用的是 sentencepiece，所以在使用时要安装这个包，下面一起来看看怎么使用：首先是要下载模型用于

curl -O "https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip"

下载模型之后大概在1.2g左右，其中包括模型的配置xlnet_config.json、英文分词模型spiece.model以及xlnet的checkpoint文件，三个和中文的bert模型类似，然后下载英文用于文本分类的语料：

curl -O "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"

解压里面有train文件下面含有neg和pos两个文件用于在英文文本二分类，然后下载xlnet源码：

 git clone https://github.com/zihangdai/xlnet.git

下载源码在需要对run_classifier.py做一些配置，命令行如下：

python run_classifier.py \
  --use_tpu=False \
  --tpu="" \
  --do_train=True \
  --do_eval=False \
  --eval_all_ckpt=False \
  --task_name=imdb \
  --data_dir=/Users/shuubiasahi/Downloads/aclImdb \
  --output_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb \
  --model_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb \
  --uncased=False \
  --spiece_model_file=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model \
  --model_config_path=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json \
  --init_checkpoint=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
  --max_seq_length=512 \
  --train_batch_size=32 \
  --eval_batch_size=8 \
  --num_hosts=1 \
  --num_core_per_host=8 \
  --learning_rate=2e-5 \
  --train_steps=4000 \
  --warmup_steps=500 \
  --save_steps=500 \
  --iterations=500

直接在ide里面如下：


# Model
flags.DEFINE_string("model_config_path", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json",
      help="Model config path.")
flags.DEFINE_float("dropout", default=0.1,
      help="Dropout rate.")
flags.DEFINE_float("dropatt", default=0.1,
      help="Attention dropout rate.")
flags.DEFINE_integer("clamp_len", default=-1,
      help="Clamp length")
flags.DEFINE_string("summary_type", default="last",
      help="Method used to summarize a sequence into a compact vector.")
flags.DEFINE_bool("use_summ_proj", default=True,
      help="Whether to use projection for summarizing sequences.")
flags.DEFINE_bool("use_bfloat16", False,
      help="Whether to use bfloat16.")

# Parameter initialization
flags.DEFINE_enum("init", default="normal",
      enum_values=["normal", "uniform"],
      help="Initialization method.")
flags.DEFINE_float("init_std", default=0.02,
      help="Initialization std when init is normal.")
flags.DEFINE_float("init_range", default=0.1,
      help="Initialization std when init is uniform.")

# I/O paths
flags.DEFINE_bool("overwrite_data", default=False,
      help="If False, will use cached data if available.")
flags.DEFINE_string("init_checkpoint", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt",
      help="checkpoint path for initializing the model. "
      "Could be a pretrained model or a finetuned model.")
flags.DEFINE_string("output_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb",
      help="Output dir for TF records.")
flags.DEFINE_string("spiece_model_file", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model",
      help="Sentence Piece model path.")
flags.DEFINE_string("model_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb",
      help="Directory for saving the finetuned model.")
flags.DEFINE_string("data_dir", default="/Users/shuubiasahi/Downloads/aclImdb",
      help="Directory for input data.")

# TPUs and machines
flags.DEFINE_bool("use_tpu", default=False, help="whether to use TPU.")
flags.DEFINE_integer("num_hosts", default=1, help="How many TPU hosts.")
flags.DEFINE_integer("num_core_per_host", default=8,
      help="8 for TPU v2 and v3-8, 16 for larger TPU v3 pod. In the context "
      "of GPU training, it refers to the number of GPUs used.")
flags.DEFINE_string("tpu_job_name", default=None, help="TPU worker job name.")
flags.DEFINE_string("tpu", default=None, help="TPU name.")
flags.DEFINE_string("tpu_zone", default=None, help="TPU zone.")
flags.DEFINE_string("gcp_project", default=None, help="gcp project.")
flags.DEFINE_string("master", default=None, help="master")
flags.DEFINE_integer("iterations", default=1000,
      help="number of iterations per TPU training loop.")

# training
flags.DEFINE_bool("do_train", default=True, help="whether to do training")
flags.DEFINE_integer("train_steps", default=10000,
      help="Number of training steps")
flags.DEFINE_integer("warmup_steps", default=0, help="number of warmup steps")
flags.DEFINE_float("learning_rate", default=1e-5, help="initial learning rate")
flags.DEFINE_float("lr_layer_decay_rate", 1.0,
                   "Top layer: lr[L] = FLAGS.learning_rate."
                   "Low layer: lr[l-1] = lr[l] * lr_layer_decay_rate.")
flags.DEFINE_float("min_lr_ratio", default=0.0,
      help="min lr ratio for cos decay.")
flags.DEFINE_float("clip", default=1.0, help="Gradient clipping")
flags.DEFINE_integer("max_save", default=0,
      help="Max number of checkpoints to save. Use 0 to save all.")
flags.DEFINE_integer("save_steps", default=100,
      help="Save the model for every save_steps. "
      "If None, not to save any model.")
flags.DEFINE_integer("train_batch_size", default=8,
      help="Batch size for training")
flags.DEFINE_float("weight_decay", default=0.00, help="Weight decay rate")
flags.DEFINE_float("adam_epsilon", default=1e-8, help="Adam epsilon")
flags.DEFINE_string("decay_method", default="poly", help="poly or cos")

# evaluation
flags.DEFINE_bool("do_eval", default=False, help="whether to do eval")
flags.DEFINE_bool("do_predict", default=False, help="whether to do prediction")
flags.DEFINE_float("predict_threshold", default=0,
      help="Threshold for binary prediction.")
flags.DEFINE_string("eval_split", default="dev", help="could be dev or test")
flags.DEFINE_integer("eval_batch_size", default=128,
      help="batch size for evaluation")
flags.DEFINE_integer("predict_batch_size", default=128,
      help="batch size for prediction.")
flags.DEFINE_string("predict_dir", default=None,
      help="Dir for saving prediction files.")
flags.DEFINE_bool("eval_all_ckpt", default=False,
      help="Eval all ckpts. If False, only evaluate the last one.")
flags.DEFINE_string("predict_ckpt", default=None,
      help="Ckpt path for do_predict. If None, use the last one.")

# task specific
flags.DEFINE_string("task_name", default="imdb", help="Task name")
flags.DEFINE_integer("max_seq_length", default=128, help="Max sequence length")
flags.DEFINE_integer("shuffle_buffer", default=2048,
      help="Buffer size used for shuffle.")
flags.DEFINE_integer("num_passes", default=1,
      help="Num passes for processing training data. "
      "This is use to batch data without loss for TPUs.")
flags.DEFINE_bool("uncased", default=False,
      help="Use uncased.")
flags.DEFINE_string("cls_scope", default=None,
      help="Classifier layer scope.")
flags.DEFINE_bool("is_regression", default=False,
      help="Whether it's a regression task.")

FLAGS = flags.FLAGS

要配置的主要是模型位置，配置文件位置，自己重写写简单，和bert大同小异，还有一些可能错误的结果把代码中的

run_config=None就可以，毕竟没有TPU

运行如下：