Tensorflow编程(1)

 Seq2Seq Model

Mini-Batch和Buckets  

buckets=[2,4,1]最好的buckets=[1,2,4]

完整的人工智能系统设计

编程工具:PyCharm

编程语言:> Python3.5

[zhang@localhost ~]$ cat tensorflow_test.py 
#!/usr/bin/env python
import tensorflow as tf
print("tensorflow version = ",tf.__version__)
[zhang@localhost ~]$ python tensorflow_test.py 
('tensorflow version = ', '1.11.0')
[zhang@localhost ~]$ 

 

查看Numpy的版本信息

pip3 show numpy 

[zhang@localhost ~]$ pip show numpy
Name: numpy
Version: 1.15.1
Summary: NumPy: array processing for numbers, strings, records, and objects.
Home-page: http://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: None
License: BSD
Location: /usr/local/anaconda2/lib/python2.7/site-packages
Requires: 
Required-by: tensorflow, tensorboard, tables, seaborn, PyWavelets, patsy, pandas, odo, numexpr, numba, mkl-random, mkl-fft, matplotlib, Keras-Preprocessing, Keras-Applications, h5py, datashape, Bottleneck, bokeh, bkcharts, astropy
[zhang@localhost ~]$ 

在开发过程中还会用到其他的开发包,在使用的时候直接可以导入

 

三Seq2Seq Model的代码结构

Python天生是面向对象,有类的结构

Class Seq2SeqModel(object):  //包含了参数和方法的合集,使用进行实例化

    def __init__: 类的参数初始化,初始化完成后直接可以用

    def sampled_loss():Loss函数

    def seq2seq_f: seq2seq函数

    def step: 训练求解

    def get_batch:批处理,

 

对于大部分的人工智能开发工程师活着研究者来讲更多的工作在超参数的调参上,因为

这些超参决定着一个模型训练的结果和效果,所以我们花一点时间解释和讨论一下超参

source_vocab_size:这个参数是决定我们采用的encoder数据字典的大小,这个参数以及下面target_vocab_size参数启示主要影响我们模型拟合的程度,启示就是表示我们在训练的时候认识多少字,如果我们在训练的时候对训练集中的字和词全面认识,那么拟合性就会比较好。

但是会造成过拟合,原因,就是当我们在实际应用过程中如果遇到了不认识的字和词就不能给出比较好的输出。但是如果太少的话,对于训练集来说,特征又太少所以会欠拟合。

因此,大家在训练中可以根据实际情况来调整这些超参,以便让模型产生更好的效果。

 

buckets:bukets可以理解成一个才有过滤器的容器,其目的就是把一定范围长度的输入,输出语句放在一起。

这样做的原因是因为对于不同长度的输入输出时有不同的 网络模型去训练,而我们实际生活中语句长度是各种各样的,如果每个不同bucket的话就会需要构建n多的神经网络,这样的模型是不可能收敛的,因此很自然的就用了长度分类的方法,将一定范围长度的数据放在一起,长短不齐的部分用PAD补全。

对于一个bucket参数,因为我们的数据是有输入输出的,因为我们用(I,O)来定义,I表示的是在桶里的输入句子最长长度,0表示桶里输出的句子最长长度。

size:这个超参数定义的是每层神经王楼中的神经元的数量,神经元的数量和神经网络层数对整个模型的计算量右决定性的作用,一般来说神经元数量越多表达训练数据集的特征能力越强,拟合度越高。 

num_layers: 这个超参数定义了神经网络的层数。

max_gradient_norm:这个值非常关键,是用来解决梯度爆炸的,在梯度下降训练体系内有两个极端的问题需要解决的,一个是梯度爆炸两外一个就是梯度弥散(消失),这个值限制了梯度的最大值,防止梯度爆炸的发生。

batch_size:进行按批进行训练时,每批数据的大小,这里多说一下,因为我们选脸的时候有非常大量的数据,我们不能一下子全部灌入计算图中进行计算因为这样既没有必要也不现实。 

learning_rate:初始的学习率,简单解释一下学习率,学习率启示是一个系数,当学习率为1的时候,下个状态的值就等于当前状态加上梯度,换句话或学习率决定了模型的学习速率。 

learning_rate_decay_factor:这个参数是设置学习率的衰减率,因为学习率越高训练数独越快但是拟合性不高,因为我们可以通过调节学习率的衰减来促进梯度下降。

use_lstm:对于神经单元是使用LSTM还是GRU,GRU是LSTM的变种版本,由于进行了门的合并,所以在计算效率上会增加,至于效果在不同的数据集上两者的表现各有千秋吧。

num_samples:softmax是logistic基础上的推广,一般用在多标签分类中。但是如果标签过多,其实就涉及到计算量过大的问题,因此采用了采样的办法用词典中随机采样作为标注。

forward_only:这个是是否进行误差逆向传播计算的标志。 

 tf.Variable参数:
    initial_value: A Tensor, or Python object convertible to a Tensor, which is the initial value for the Variable. The initial value must have a shape specified unless validate_shape is set to False. Can also be a callable with no argument that returns the initial value when called. In that case, dtype must be specified. (Note that initializer functions from init_ops.py must first be bound to a shape before being used here.)
    trainable: If True, the default, also adds the variable to the graph collection GraphKeys.TRAINABLE_VARIABLES. This collection is used as the default list of variables to use by the Optimizer classes.
    collections: List of graph collections keys. The new variable is added to these collections. Defaults to [GraphKeys.GLOBAL_VARIABLES].
    validate_shape: If False, allows the variable to be initialized with a value of unknown shape. If True, the default, the shape of initial_value must be known.
    caching_device: Optional device string describing where the Variable should be cached for reading. Defaults to the Variable's device. If not None, caches on another device. Typical use is to cache on the device where the Ops using the Variable reside, to deduplicate copying through Switch and other conditional statements.
    name: Optional name for the variable. Defaults to 'Variable' and gets uniquified automatically.
    variable_def: VariableDef protocol buffer. If not None, recreates the Variable object with its contents, referencing the variable's nodes in the graph, which must already exist. The graph is not changed. variable_def and the other arguments are mutually exclusive.
    dtype: If set, initial_value will be converted to the given type. If None, either the datatype will be kept (if initial_value is a Tensor), or convert_to_tensor will decide.
    expected_shape: A TensorShape. If set, initial_value is expected to have this shape.
    import_scope: Optional string. Name scope to add to the Variable. Only used when initializing from protocol buffer.
    constraint: An optional projection function to be applied to the variable after being updated by an Optimizer (e.g. used to implement norm constraints or value constraints for layer weights). The function must take as input the unprojected Tensor representing the value of the variable and return the Tensor for the projected value (which must have the same shape). Constraints are not safe to use when doing asynchronous distributed training.
    use_resource: if True, a ResourceVariable is created; otherwise an old-style ref-based variable is created. When eager execution is enabled a resource variable is always created.
    synchronization: Indicates when a distributed a variable will be aggregated. Accepted values are constants defined in the class tf.VariableSynchronization. By default the synchronization is set to AUTO and the current DistributionStrategy chooses when to synchronize. If synchronization is set to ON_READ, trainable must not be set to True.
    aggregation: Indicates how a distributed variable will be aggregated. Accepted values are constants defined in the class tf.VariableAggregation.

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值