序列模型开发者的福音Lingvo: A TensorFlow Framework for Sequence Modeling

最新推荐文章于 2024-03-22 09:58:14 发布

王发北

最新推荐文章于 2024-03-22 09:58:14 发布

阅读量730

点赞数

本文链接：https://blog.csdn.net/wwangfabei1989/article/details/87911884

版权

Machine Learning 同时被 3 个专栏收录

56 篇文章 6 订阅

订阅专栏

Deep Learning

41 篇文章 0 订阅

订阅专栏

tensorflow

10 篇文章 0 订阅

订阅专栏

原文地址：https://medium.com/tensorflow/lingvo-a-tensorflow-framework-for-sequence-modeling-8b1d6ffba5bb?linkId=63952201

github:https://github.com/tensorflow/lingvo

colab:https://colab.research.google.com/github/tensorflow/lingvo/blob/master/codelabs/introduction.ipynb

Lingvo is the international language Esperanto word for “language”. This naming alludes to the roots of the Lingvo framework — it was developed as a general deep learning framework using TensorFlow with a focus on sequence models for language-related tasks such as machine translation, speech recognition, and speech synthesis.

Internally, the framework gained traction and the number of researchers using it ballooned. As a result, there are now dozens of published papers with state-of-the-art results produced using Lingvo with more to come. Supported architectures range from traditional RNN sequence models to Transformer models and models that include VAE components. To show our support of the research community and encourage reproducible research effort, we have open-sourced the framework and are starting to release the models used in our papers

Lingvo was built with collaborative research in mind, and promotes code reuse by sharing the implementation of common layers across different tasks. In addition, all layers implement the same common interface and are laid out in the same way. Not only does this produce cleaner and more understandable code, it makes it extremely simple to apply improvements someone else made for a different task to your own task. Enforcing this consistency does come at the cost of requiring more discipline and boilerplate, but Lingvo attempts to minimize this to ensure fast iteration time during research.

Another aspect of collaboration is sharing reproducible results. Lingvo provides a centralized location for checked-in model hyperparameter configurations. Not only does this serve to document important experiments, it gives others an easy way to reproduce your results by training an identical model.

def Task(cls):
  p = model.AsrModel.Params()
  p.name = 'librispeech'

  # Initialize encoder params.
  ep = p.encoder
  # Data consists 240 dimensional frames (80 x 3 frames), which we
  # re-interpret as individual 80 dimensional frames. See also,
  # LibrispeechCommonAsrInputParams.
  ep.input_shape = [None, None, 80, 1]
  ep.lstm_cell_size = 1024
  ep.num_lstm_layers = 4
  ep.conv_filter_shapes = [(3, 3, 1, 32), (3, 3, 32, 32)]
  ep.conv_filter_strides = [(2, 2), (2, 2)]
  ep.cnn_tpl.params_init = py_utils.WeightInit.Gaussian(0.001)
  # Disable conv LSTM layers.
  ep.num_conv_lstm_layers = 0

  # Initialize decoder params.
  dp = p.decoder
  dp.rnn_cell_dim = 1024
  dp.rnn_layers = 2
  dp.source_dim = 2048
  # Use functional while based unrolling.
  dp.use_while_loop_based_unrolling = False

  tp = p.train
  tp.learning_rate = 2.5e-4
  tp.lr_schedule = lr_schedule.ContinuousLearningRateSchedule.Params().Set(
      start_step=50000, half_life_steps=100000, min=0.01)

  # Setting p.eval.samples_per_summary to a large value ensures that dev,
  # devother, test, testother are evaluated completely (since num_samples for
  # each of these sets is less than 5000), while train summaries will be
  # computed on 5000 examples.
  p.eval.samples_per_summary = 5000
  p.eval.decoder_samples_per_summary = 0

  # Use variational weight noise to prevent overfitting.
  p.vn.global_vn = True
  p.train.vn_std = 0.075
  p.train.vn_start_step = 20000

  return p

An example of a task configuration in Lingvo. Hyperparameters for each experiment is configured in its own class separate from the code that builds the network and checked into version control.Source

While Lingvo started out with a focus on NLP, it is inherently very flexible, and models for tasks such as image segmentation and point cloud classification have been successfully implemented using the framework. Distillation, GANs, and multi-task models are also supported. At the same time, the framework does not compromise on speed, and features an optimized input pipeline and fast distributed training. Finally, Lingvo was put together with an eye towards easy productionization, and there is even a well-defined path towards porting models for mobile inference.

To jump straight into the code, check out our github page and the codelab. To learn more details about Lingvo or some of the advanced features it supports, see our paper.

知乎： https://zhuanlan.zhihu.com/albertwang

微信公众号：