VW命令行参数

训练

训练模型时最简单的命令如下

vw train_file –cache_file cache_train -f model_file

预测

vw -t –cache_file cache_test -i model_file -p result.txt test_file

  • -t:忽略数据中的标签信息,之对样本数据进行预测。
  • –cache_file:缓存文件同上。
  • i:设置用于预测未知数据的模型。
  • p:设置预测数据的结果文件。
  • test_file:被预测样本的数据文件。

其它参数详解

1 VW可选参数

  • -h [ –help ]:查看帮助信息。
  • –version:查看版本信息。
  • –random_seed arg:设置产生随机数的种子。
  • –noop:不学习。

2 输入参数

  • -d [ –data ]:设置样本数据文件。
  • –ring_size arg size of example ring
  • –examples arg number of examples to parse
  • –daemon read data from port 26542
  • –port:监听端口。
  • –num_children arg (=10) number of children for
    persistent daemon mode
  • –pid_file arg Write pid file in
    persistent daemon mode
  • –passes arg (=1):模型训练的迭代次数,不设置时默认为迭代一次。
  • -c [ –cache ]:使用缓存,默认情况下的缓存文件存储在.cache。
  • –cache_file arg:设置缓存文件。
  • –compressed:如果需要压缩时使用gzip压缩格式,如果需要产生缓存文件,则用压缩格式存储。在自动检测模式下,输入文件支持文本和压缩格式的混合。
  • –no_stdin do not default to reading from stdin
  • –save_resume save extra state so learning can be resumed
    later with new data

Raw training/testing data (in the proper plain text input format) can be passed to VW in a number of ways:

Using the -d or --data options which expect a file name as an argument (specifying a file name that is not associated with any option also works);
Via stdin;
Via a TCP/IP port if the --daemon option is specified. The port itself is specified by --port otherwise the default port 26542 is used. The daemon by default creates 10 child processes which share the model state, allowing answering multiple simultaneous queries. The number of child processes can be controlled with --num_children, and you can create a file with the jobid using --pid_file which is later useful for killing the job.

Parsing raw data is slow so there are options to create or load data in VW’s native format. Files containing data in VW’s native format are called caches. The exact contents of a cache file depend on the input as well as a few options (-b, –affix, –spelling) that are passed to VW during the creation of the cache. This implies that using the cache file with different options might cause VW to rebuild the cache. The easiest way to use a cache is to always specify the -c option. This way, VW will first look for a cache file and create it if it doesn’t exist. To override the default cache file name use –cache_file followed by the file name.

–compressed can be used for reading gzipped raw training data, writing gzipped caches, and reading gzipped caches.

–passes takes as an argument the number of times the algorithm will cycle over the data (epochs).

6 权重设置参数

  • -b [ –bit_precision ] arg number of bits in the feature table
  • -i [ –initial_regressor ] arg Initial regressor(s) to load into memory (arg is filename)
  • -f [ –final_regressor ] arg:设置保存模型的文件。
  • –random_weights arg make initial weights random
  • –initial_weight arg (=0):将所有权重设置成初始值1。
  • –readable_model arg:输出可阅读的模型。
  • –invert_hash arg:输出可阅读的模型。
  • –save_per_pass:在每次训练之后都保存模型结果。
  • –input_feature_regularizer arg Per feature regularization input file
  • –output_feature_regularizer_binary arg Per feature regularization output file
  • –output_feature_regularizer_text arg Per feature regularization output file, in text

VW hashes all features to a predetermined range [0,2^b-1] and uses a fixed weight vector with 2^b components. The argument of -b option determines the value of (b) which is 18 by default. Hashing the features allows the algorithm to work with very raw data (since there’s no need to assign a unique id to each feature) and has only a negligible effect on generalization performance (see for example Feature Hashing for Large Scale Multitask Learning.

在训练模型时用-f指定模型文件,当对模型进行重新训练(接着之前的训练结果继续训练),可以使用-i指定现有的模型。


–readable_model的功能和-f相同,也是指定保存模型的文件,只不过保存的结果不是二进制的,而是更适合阅读的文本模式,格式为特征的hash值:特征权重


–invert_hash和–readable_model的功能类似,但是输出的模型更适合人的阅读习惯,特征名称:特征的hash值:特征权重,每个特征名称后面后会跟着特征的hash值,然后是特征权值。注意,使用–invert_hash参数会需要更多的计算资源,同时计算时间复杂度也会变大。 此时特征名字不会存储在缓存文件中 (如果有-c参数存在,而且存在缓存文件,而同时又想使用–invert_hash参数,则程序会将缓存文件删除,或者用参数-k使程序自动检测处理这种情况)。对与多分类学习,必须有参数-c存在,建议首先不设置参数–invert_hash对模型进行训练,然后把参数-t去掉 ,在加上–invert_hash参数,再次运行程序,这时程序只是读取之前的二进制模型(-i参数控制),将其转换为文本格式(–invert_hash参数控制)。


–save_per_pass saves:如果设置了此参数,在每次计算完成后都会存储模型结果。


–input_feature_regularizer, –output_feature_regularizer_binary,


–output_feature_regularizer_text are analogs of -i, -f, and


–readable_model for batch optimization where want to do per feature regularization. This is advanced, but allows efficient simulation of online learning with a batch optimizer.


By default VW starts with the zero vector as its hypothesis. The –random_weights option initializes with random weights. This is often useful for symmetry breaking in advanced models. It’s also possible to initialize with a fixed value such as the all-ones vector using –initial_weight.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值