Factorization Machines

pdftotext工具推荐

sudo apt-get install gocr
pdftotext xxx.pdf a.txt

Factorization Machines

Steffen Rendle
Department of Reasoning for Intelligence
The Institute of Scientific and Industrial Research
Osaka University, Japan
rendle@ar.sanken.osaka-u.ac.jp

LibFFM

下载:http://www.csie.ntu.edu.tw/~r01922136/libffm/
安装:make
训练:./ffm-train bigdata.tr.txt model
预测:./ffm-predict bigdata.te.txt model output

bigdata.tr.txt 数据格式
The data format of LIBFFM is:
<label> <field1>:<index1>:<value1> <field2>:<index2>:<value2> ...
field and index 是非负整数
参看head -1 bigdata.tr.txt,label =1 , field有16个

1 0:0:0.3651 2:1163:0.3651 3:8672:0.3651 4:2183:0.3651 5:2332:0.3651 6:185:0.3651 7:2569:0.3651 8:8131:0.3651 9:5483:0.3651 10:215:0.3651 11:1520:0.3651 12:1232:0.3651 13:2738:0.3651 14:2935:0.3651 15:5428:0.3651 17:2434:0.50000 16:7755:0.50000

代码解读:

# ffm-train.cpp
  int train(Option opt){
  ffm_problem *tr =    ffm_read_problem(opt.tr_path.c_str());
   ffm_problem *va = nullptr;
   va = ffm_read_problem(opt.va_path.c_str());
   if(opt.do_cv)
    {
        ffm_cross_validation(tr, opt.nr_folds, opt.param);
    }
    else
    {
        ffm_model *model = ffm_train_with_validation(tr, va, opt.param);

        status = ffm_save_model(model, opt.model_path.c_str());

        ffm_destroy_model(&model);
    }

    ffm_destroy_problem(&tr);
    ffm_destroy_problem(&va);
  }
# 主要在ffm_train_with_validation
# ffm.ccp
ffm_train_with_validation(){
  ...
  shared_ptr<ffm_model> model = train(tr, order, param, va);
  ...
}
train(
    ffm_problem *tr,
    vector<ffm_int> &order,
    ffm_parameter param,
    ffm_problem *va=nullptr){

shared_ptr<ffm_model> model =
        shared_ptr<ffm_model>(init_model(tr->n, tr->m, param),
            [] (ffm_model *ptr) { ffm_destroy_model(&ptr); });


}


ffm-train

Command Line Usage
==================
-   `ffm-train'
    usage: ffm-train [options] training_set_file [model_file]

    options:
    -l <lambda>: set regularization parameter (default 0)
    -k <factor>: set number of latent factors (default 4)
    -t <iteration>: set number of iterations (default 15)
    -r <eta>: set learning rate (default 0.1)
    -s <nr_threads>: set number of threads (default 1)
    -p <path>: set path to the validation set
    -v <fold>: set the number of folds for cross-validation
    --quiet: quiet model (no output)
    --no-norm: disable instance-wise normalization
    --no-rand: disable random update
    --on-disk: perform on-disk training (a temporary file <training_set_file>.bin will be generated)

    By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use `--no-norm' to disable this function.

    By default, our algorithm randomly select an instance for update in each inner iteration. On some datasets you may want to do update in the original order. You can do it by using `--no-rand' together with `-s 1.'

    If you do not have enough memory, then you can use `--on-disk' to do
    disk-level training. Two restrictions when you use this mode:
        1. So far we do not allow random update in the mode, so please use
           `--no-rand' if you want to do on-disk training.

        2. Cross-validation in this mode is not yet supported.

    A binary file `training_set_file.bin' will be generated to store the data
    in binary format.

ffm-predict

usage: ffm-predict test_file model_file output_file

Example


> ffm-train bigdata.tr.txt model

train a model using the default parameters

> ffm-train -l 0.001 -k 16 -t 30 -r 0.05 -s 4 bigdata.tr.txt model

train a model using the following parameters:

    regularization cost = 0.001
    latent factors = 16
    iterations = 30
    learning rate = 0.05
    threads = 4

> ffm-train -p bigdata.te.txt bigdata.tr.txt model

use bigdata.te.txt as validation set

> ffm-train -v 5 bigdata.tr.txt

do five fold cross validation

> ffm-train --quiet bigdata.tr.txt

do not print message to screen

> ffm-predict bigdata.te.txt model output

do prediction

> ffm-train --no-rand --on-disk bigdata.tr.txt

perform on-disk training

Library Usage

These structures and functions are declared in the header file ffm.h.' You
need to #include
ffm.hin your C/C++ source files and link your program withffm.cpp.You can seeffm-train.cppandffm-predict.cpp` for examples showing how to use them.

There are four public data structures in LIBFFM.


-   struct ffm_node
    {
        ffm_int f;    // field index
        ffm_int j;    // column index
        ffm_float v;  // value
    };

    Each `ffm_node' represents a non-zero element in a sparse matrix.

-   struct ffm_problem
    {
        ffm_int n;      // number of features
        ffm_int l;      // number of instances
        ffm_int m;      // number of fields
        ffm_node *X;    // non-zero elements
        ffm_long *P;    // row pointers
        ffm_float *Y;   // labels
    };

-   struct ffm_parameter
    {
        ffm_float eta;
        ffm_float lambda;
        ffm_int nr_iters;
        ffm_int k;
        ffm_int nr_threads;
        bool quiet;
        bool normalization;
        bool random;
    };

`ffm_parameter' represents the parameters used for training. The meaning of
    each variable is:

    variable         meaning                             default
    ============================================================
    eta              learning rate                           0.1
    lambda           regularization cost                       0
    nr_iters         number of iterations                     15
    k                number of latent factors                  4
    nr_threads       number of threads used                    1
    quiet            no outputs to stdout                  false
    normalization    instance-wise normalization           false
    raondom          randomly select instance in SG         true

    To obtain a parameter object with default values, use the function
    `ffm_get_default_param.'


-   struct ffm_model
    {
        ffm_int n;              // number of features
        ffm_int m;              // number of fields
        ffm_int k;              // number of latent factors
        ffm_float *W;           // store model values
        bool normalization;     // do instance-wise normalization
    };

function


-   ffm_parameter ffm_get_default_param();

    Get default parameters.

-   ffm_int ffm_save_model(struct ffm_model const *model, char const *path);

    Save a model. It returns 0 on sucess and 1 on failure.

-   struct ffm_model* ffm_load_model(char const *path);

    Load a model. If the model could not be loaded, a nullptr is returned.

-   void ffm_destroy_model(struct ffm_model **model);

    Destroy a model.

-   struct ffm_model* ffm_train(
        struct ffm_problem const *prob,
        ffm_parameter param);

    Train a model.

-   struct ffm_model* ffm_train_with_validation(
        struct ffm_problem const *Tr,
        struct ffm_problem const *Va,
        ffm_parameter param);

    Train a model with training set 'Tr' and validation set 'Va.' The logloss of the validation set is printed at each iteration.

-   ffm_float ffm_cross_validation(
        struct ffm_problem const *prob,
        ffm_int nr_folds,
        ffm_parameter param);

    Do cross validation with 'nr_folds' folds.

-   ffm_float ffm_predict(ffm_node *begin, ffm_node *end, ffm_model *model);

    Do prediction. 'begin' and 'end' are pointers to specify the beginning and ending position of the instance to be predicted.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值