

Abstract—In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. We show that the model equation of FMs can be calculated in linear time and thus FMs can be optimized directly. So unlike nonlinear SVMs, a transformation in the dual form is not necessary and the model parameters can be estimated directly without the need of any support vector in the solution. We show the relationship to SVMs and the advantages of FMs for parameter estimation in sparse settings.

           On the other hand there are many different factorization models like matrix factorization, parallel factor analysis or specialized models like SVD++, PITF or FPMC. The drawback of these models is that they are not applicable for general prediction tasks but work only with special input data. Furthermore their model equations and optimization algorithms are derived individually for each task. We show that FMs can mimic these models just by specifying the input data (i.e. the feature vectors). This makes FMs easily applicable even for users without expert knowledge in factorization models.
           Index Terms—factorization machine; sparse data; tensor fac- torization; support vector machine
           另一方面,有许多不同的因子分解模型,如矩阵分解,并行因子分析或专用模型,如SVD ++,PITF或FPMC。这些模型的缺点是它们不适用于能用的预测任务,但仅适用于特殊输入数据。此外,他们的模型方程和优化算法是针对每个任务单独导出的。FM仅通过指定输入数据(即特征向量)就可以模拟这些模型。这使得即使对于没有分解模型专业知识的用户,FM也很容易适用。


        Support Vector Machines are one of the most popular predictors in machine learning and data mining. Nevertheless in settings like collaborative filtering, SVMs play no important role and the best models are either direct applications of standard matrix/ tensor factorization models like PARAFAC [1] or specialized models using factorized parameters [2], [3], [4]. In this paper, we show that the only reason why standard SVM predictors are not successful in these tasks is that they cannot learn reliable parameters (‘hyperplanes’) in complex (non-linear) kernel spaces under very sparse data. On the other hand, the drawback of tensor factorization models and even more for specialized factorization models is that (1) they are not applicable to standard prediction data (e.g. a real valued feature vector in Rn.) and (2) that specialized models are usually derived individually for a specific task requiring effort in modelling and design of a learning algorithm.
        支持向量机是机器学习和数据挖掘中最受欢迎的预测器之一。 然而,在协同过滤等环境中,SVM并不起重要作用,最好的模型要么是直接应用于标准矩阵/张量分解模型,如PARAFAC [1],要么是使用分解参数[2],[3],[4]的专用模型。 在本文中,我们表明标准SVM预测器在这些任务中不成功的唯一原因,是它们无法在非常稀疏的数据下学习复杂(非线性)内核空间中的可靠参数(“超平面”)。 另一方面,张量因子分解模型,甚至专门分解模型的缺点是(1)它们不适用于标准预测数据(例如Rn中的实值特征向量)和(2)专用模型是 通常为需要在学习和设计学习算法方面付出努力的特定任务单独导出。

        In this paper, we introduce a new predictor, the Factorization Machine (FM), that is a general predictor like SVMs but is also able to estimate reliable parameters under very high sparsity. The factorization machine models all nested variable interactions (comparable to a polynomial kernel in SVM), but uses a factorized parametrization instead of a dense parametrization like in SVMs. We show that the model equation of FMs can be computed in linear time and that it depends only on a linear number of parameters. This allows direct optimization and storage of model parameters without the need of storing any training data (e.g. support vectors) for prediction. In contrast to this, non-linear SVMs are usually optimized in the dual form and computing a prediction (the model equation) depends on parts of the training data (the support vectors). We also show that FMs subsume many of the most suc





