接下来又是一个trait ALSParams,参数包括
rank:矩阵因子模型的阶,是大于等于1的整数,默认10
numUserBlocks:用户矩阵分块数,是大于等于1的整数,默认是10
numItemBlocks:项目矩阵分块数,是大于等于1的整数,默认是10
implicitPrefs:是否使用隐式偏好,默认否
alpha:隐式偏好公式的alpha参数,大于等于0的小数,默认是1
ratingCol:评分列的列名,默认rating
nonnegative:在最小平方法中是否使用非负限制,默认否
maxIter:最大迭代次数10次
regParam:正则化参数0.1
checkpointInterval:checkpoint间隔10次迭代
参数里还有这么一段检查输入有效性并转换输入的代码,这里还是想先说一下StructType和StructField的关系,StructType是StructField的数组,StructField里面分别有列名,类型,列值是否可以为空的布尔值,这样StructType就构成了sql数据的schema
val struct =
StructType(
StructField("a", IntegerType, true) ::
StructField("b", LongType, false) ::
StructField("c", BooleanType, false) :: Nil)
protected def validateAndTransformSchema(schema: StructType): StructType = {
SchemaUtils.checkColumnType(schema, $(userCol), IntegerType)//检查user列是不是整数类型
SchemaUtils.checkColumnType(schema, $(itemCol), IntegerType)//检查item列是不是整数类型
val ratingType = schema($(ratingCol)).dataType//StructType(StructType)可以访问StructField,获取rating的类型
require(ratingType == FloatType || ratingType == DoubleType)//rating类型为小数
SchemaUtils.appendColumn(schema, $(predictionCol), FloatType)//在原有的schema后面增加类型为小数的预测列
}
下面是ALSModel 类,
class ALSModel private[ml] (
override val uid: String,
val rank: Int,//矩阵因子模型的阶
@transient val userFactors: DataFrame,//有两列,一列是用户id,一列用户属性
@transient val itemFactors: DataFrame)//有两列,一列是项目id,一列项目属性
extends Model[ALSModel] with ALSModelParams with MLWritable {
//以后我会忽略中间不重要的代码,直贴关键代码
override def transform(dataset: DataFrame): DataFrame = {
// Register a UDF for DataFrame, and then
// create a new column named map(predictionCol) by running the predict UDF.
val predict = udf { (userFeatures: Seq[Float], itemFeatures: Seq[Float]) =>//搞了一个udf,输入是用户属性数组,项目属性数组
if (userFeatures != null && itemFeatures != null) {//如果都不为空
blas.sdot(rank, userFeatures.toArray, 1, itemFeatures.toArray, 1)//点积,第一个参数是矩阵因子模型的阶,第二个参数是用户属性,第三个参数是1,第四个参数是项目参数,第五个参数是1,predict返回的结果是取用户和项目前rank组对应元素乘积的和就,这不就是相似度么?
这里解释下怎么算,假设数组是这样的
val syn0 = Array[Float](1,1,1,1,1,1,1,1,1,1)
val syn1 = Array[Float](1,2,3,4,5,6,7,8,9,10)
blas.sdot(2, syn0, 3, syn1, 3)
第一个2代表取两对元素的点积,第二个3代表步长,从索引0开始,3,6..这样,第三个3同理,这样计算的结果是1*1+1*4=5
} else {Float.NaN
}
dataset//后面干的事就是把用户属性项目属性leftjoin到数据源,然后对新增的两列用刚才的udf处理,预测出结果
.join(userFactors, dataset($(userCol)) === userFactors("id"), "left")
.join(itemFactors, dataset($(itemCol)) === itemFactors("id"), "left")
.select(dataset("*"),
predict(userFactors("features"), itemFactors("features")).as($(predictionCol)))
}
我们还是先简单了解下注释说了什么,然后再看下代码怎么实现的
/**
* :: Experimental ::
* Alternating Least Squares (ALS) matrix factorization.//交替最小平方法矩阵因子分解
*
* ALS attempts to estimate the ratings matrix `R` as the product of two lower-rank matrices,//ALS尝试评估两个低阶矩阵乘机的评级矩阵,例如给定X,Y矩阵,R=X*Yt
* `X` and `Y`, i.e. `X * Yt = R`. Typically these approximations are called 'factor' matrices.//这些估计值被称为因子矩阵
* The general approach is iterative. During each iteration, one of the factor matrices is held//一般通过迭代实现,每次迭代一个因子矩阵保持不变,其他的因子矩阵使用最小平方求解
* constant, while the other is solved for using least squares. The newly-solved factor matrix is//然后最新求解的因子矩阵保持不变再求解其他因子矩阵
* then held constant while solving for the other factor matrix.
*
* This is a blocked implementation of the ALS factorization algorithm that groups the two sets//ALS因子分解算法把用户和产品两个矩阵分成若干块,每次迭代只传递每个用户向量的一份副本给每个产品分块矩阵来减少通信,然后分块执行
* of factors (referred to as "users" and "products") into blocks and reduces communication by only
* sending one copy of each user vector to each product block on each iteration, and only for the//仅当产品分块需要用户特征向量时
* product blocks that need that user's feature vector. This is achieved by pre-computing some//这需要通过预计算一些评级矩阵的信息去决定每个用户外链(也就是用户对哪些产品分块矩阵有贡献)和每个产品(产品分块矩阵从有依赖关系的每个用户分块矩阵收到哪些特征向量)的内链信息
* information about the ratings matrix to determine the "out-links" of each user (which blocks of
* products it will contribute to) and "in-link" information for each product (which of the feature
* vectors it receives from each user block it will depend on). This allows us to send only an//这允许我们只发送一个用户分块和产品分块的特征向量数组,通过产品分块找到用户评级并基于这些信息更新产品矩阵
* array of feature vectors between each user block and product block, and have the product block
* find the users' ratings and update the products based on these messages.
*
* For implicit preference data, the algorithm used is based on//对于隐式偏好数据,算法基于隐式反馈数据的协同过滤
* "Collaborative Filtering for Implicit Feedback Datasets", available at
* [[http://dx.doi.org/10.1109/ICDM.2008.22]], adapted for the blocked approach used here.
*
* Essentially instead of finding the low-rank approximations to the rating matrix `R`,//本质上,这个方法寻找偏好矩阵的近似值而不是寻找评级矩阵的低阶估计,如果评级矩阵元素r>0那么偏好矩阵的元素p>0,如果评级矩阵元素r<=0那么偏好矩阵的元素p=0
* this finds the approximations for a preference matrix `P` where the elements of `P` are 1 if
* r > 0 and 0 if r <= 0. The ratings then act as 'confidence' values related to strength of//相对指定用户的强度,评级代表可信值
* indicated user
* preferences rather than explicit ratings given to items.//得出项目的偏好而不是精确的评级
*/
首先是ALS类核心方法如下
override def fit(dataset: DataFrame): ALSModel = {
import dataset.sqlContext.implicits._
val r = if ($(ratingCol) != "") col($(ratingCol)).cast(FloatType) else lit(1.0f)//如果ratingCol不为空,把ratingCol列转成floattype类型,比doubletype节省空间,否则生成一个值为1.0f的列
val ratings = dataset
.select(col($(userCol)).cast(IntegerType), col($(itemCol)).cast(IntegerType), r)//从数据源拿出userid,,itemid,rating三列组成一个Rating的类
.map { row =>
Rating(row.getInt(0), row.getInt(1), row.getFloat(2))
}
val (userFactors, itemFactors) = ALS.train(ratings, rank = $(rank),//训练刚才的Rating数据并传入参数,返回用户因子矩阵和项目因子矩阵
numUserBlocks = $(numUserBlocks), numItemBlocks = $(numItemBlocks),
maxIter = $(maxIter), regParam = $(regParam), implicitPrefs = $(implicitPrefs),
alpha = $(alpha), nonnegative = $(nonnegative),
checkpointInterval = $(checkpointInterval), seed = $(seed))
val userDF = userFactors.toDF("id", "features")//把用户因子矩阵和项目因子矩阵转成dataframe,第一列是id,第二列是特征
val itemDF = itemFactors.toDF("id", "features")
val model = new ALSModel(uid, $(rank), userDF, itemDF).setParent(this)//构造ALSModel
copyValues(model)//把参数也放入模型中,返回模型
}
接下来是ALS伴生对象,里面有接近千行的的方法,我们以train为入口逐个过滤
train方法里的参数前面已经讲过了,伴生对象的train方法多了两个参数intermediateRDDStorageLevel,finalRDDStorageLevel分别是中间结果存储级别和最终结果存放级别都是MEMORY_AND_DISK,返回的是用户因子矩阵和项目因子矩阵,格式是RDD[(ID, Array[Float])],id是int或long类型
val userPart = new ALSPartitioner(numUserBlocks)//其中ALSPartitioner是org.apache.spark.HashPartitioner,可以通过主键key找到对应Partitioner的索引,传入的numUserBlocks是Partitioner的总数,这里Partitioner索引为0-9,这两句声明了了两个大小为10的HashPartitioner
val itemPart = new ALSPartitioner(numItemBlocks)
val userLocalIndexEncoder = new LocalIndexEncoder(userPart.numPartitions)
val itemLocalIndexEncoder = new LocalIndexEncoder(itemPart.numPartitions)
这两句又声明了两个本地索引编码器我们看下LocalIndexEncoder这个类
/**
* Encoder for storing (blockId, localIndex) into a single integer.//把blockid,localindex编码成一个整数,便于存储
*
* We use the leading bits (including the sign bit) to store the block id and the rest to store//我们使用包括符号位的首尾存储blockid,剩余位存储本地索引
* the local index. This is based on the assumption that users/items are approximately evenly//这是基于用户项目近似均匀分区的假设
* partitioned. With this assumption, we should be able to encode two billion distinct values.//我们能够解码二十亿个唯一值
*
* @param numBlocks number of blocks//numBlocks 是分块数
*/
private[recommendation] class LocalIndexEncoder(numBlocks: Int) extends Serializable {
require(numBlocks > 0, s"numBlocks must be positive but found $numBlocks.")
private[this] final val numLocalIndexBits =
math.min(java.lang.Integer.numberOfLeadingZeros(numBlocks - 1), 31)//numberOfLeadingZeros这个方法返回二进制从左开始连续不为0的位数,这里传入9,返回28,就是前28位都为0,28和31取最小numLocalIndexBits 是28
private[this] final val localIndexMask = (1 << numLocalIndexBits) - 1//把1左移28位-1构成二进制28个1,左4位为0
/** Encodes a (blockId, localIndex) into a single integer. */
def encode(blockId: Int, localIndex: Int): Int = {//传入blockid,localindex
require(blockId < numBlocks)//blockId 要小于10
require((localIndex & ~localIndexMask) == 0)//~localIndexMask按位取反是左4位为1,其余28位为0,localIndex & ~localIndexMask保证localIndex只在后28位,这里~localIndexMask相当于l确定了ocalIndex 的位范围
(blockId << numLocalIndexBits) | localIndex//block左移28位,与localindex按位或,这样保证了blockid在前28位,localIndex在后4位,够成了一个编码
}
/** Gets the block id from an encoded index. */
@inline
def blockId(encoded: Int): Int = {//传入编码,编码右移28位获取blockid
encoded >>> numLocalIndexBits
}
/** Gets the local index from an encoded index. */
@inline
def localIndex(encoded: Int): Int = {//传入编码,与localIndexMask求并得出localIndex
encoded & localIndexMask
}
}
回到train
val solver = if (nonnegative) new NNLSSolver else new CholeskySolver//如果最小平方法限制使用非负限制,则调用NNLSSolver ,如果不使用非负限制,调用CholeskySolver,这两个类都继承自LeastSquaresNESolver ,我们分别看下LeastSquaresNESolver, NNLSSolver 和CholeskySolver
/** Trait for least squares solvers applied to the normal equation. *///解决正规方程的最小平方法
private[recommendation] trait LeastSquaresNESolver extends Serializable {
/** Solves a least squares problem with regularization (possibly with other constraints). */通过正则化解决最小平方问题,可能有其他限制
def solve(ne: NormalEquation, lambda: Double): Array[Float]//solve方法,有两个参数,一个是正规方程,一个是正则化参数lambda,返回一个小数数组,注意正规方程是一个类,我们先看下这个类
}
/**
* Representing a normal equation to solve the following weighted least squares problem://解决下列带权重正规方程的最小平方问题,这只是正规方程类,具体解决最小平方的方法还在后面
*
* minimize \sum,,i,, c,,i,, (a,,i,,^T^ x - b,,i,,)^2^ + lambda * x^T^ x.//这种记法我也不太懂,直接看代码
*
* Its normal equation is given by
*
* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0.
*/
private[recommendation] class NormalEquation(val k: Int) extends Serializable {//正规方程类要传入方阵的阶数,注意这里的ata是上三角矩阵对应的向量!
/** Number of entries in the upper triangular part of a k-by-k matrix. *///k阶矩阵上三角部分的元素数
val triK = k * (k + 1) / 2
/** A^T^ * A */
val ata = new Array[Double](triK)//把上三角矩阵元素弄成一个数组
/** A^T^ * b */
val atb = new Array[Double](k)//生成和阶数相同的数组
private val da = new Array[Double](k)//也是一个和阶数相同的数组
private val upper = "U"
private def copyToDouble(a: Array[Float]): Unit = {//把传进来的a拷贝到da
var i = 0
while (i < k) {
da(i) = a(i)
i += 1
}
}
/** Adds an observation. */
def add(a: Array[Float], b: Double, c: Double = 1.0): this.type = {//传入三个参数
require(c >= 0.0)//c要非负
require(a.length == k)//a的长度与矩阵阶数相同
copyToDouble(a)//拷贝a到da
blas.dspr(upper, k, c, da, 1, ata)//upper代表ata提供上三角矩阵,k矩阵的阶,c即alpha=1,由于下一个参数步长为1,da是长度为k的向量,ata是按列存储的上三角矩阵向量,ata( 1 )代表a( 1, 1 ), ata( 2 )代表a( 1, 2 ),ata( 3 ) 代表a( 2, 2 )。这个方法实现了A := alpha*x*x' + A,A 就是ata,alpha就是c=1,x就是da,计算结果A被更新,由于A是对称阵,ata只记录上三角元素就相当于记录了整个矩阵
if (b != 0.0) {
blas.daxpy(k, c * b, da, 1, atb, 1)//如果b不为0,计算atb:=da*c*b+atb,k为参与运算的元素个数,c*b是与da相乘的系数,两个1代表步长增量,结果是更新了atb
}
this//返回当前对象
}
/** Merges another normal equation object. *///合并两个正规方程对象
def merge(other: NormalEquation): this.type = {
require(other.k == k)
blas.daxpy(ata.length, 1.0, other.ata, 1, ata, 1)//把ata合并
blas.daxpy(atb.length, 1.0, other.atb, 1, atb, 1)//把atb合并
this
}
/** Resets everything to zero, which should be called after each solve. *///每次求解后把所有置为0
def reset(): Unit = {
ju.Arrays.fill(ata, 0.0)//把ata置为0
ju.Arrays.fill(atb, 0.0)//把atb置为0
}
}
再看下继承LeastSquaresNESolver的CholeskySolver
private[recommendation] class CholeskySolver extends LeastSquaresNESolver {//Cholesky 分解是把一个对称正定的矩阵表示成一个上三角矩阵L和其转置的乘积的分解。它要求矩阵的所有特征值必须大于零,故分解的上三角的对角元也是大于零的。Cholesky分解法又称平方根法,注意这里的ata用的是NormalEquation类成员ata,是上三角矩阵对应的向量!
/**
* Solves a least squares problem with L2 regularization:
*
* min norm(A x - b)^2^ + lambda * norm(x)^2^//目标函数是最小化A x - b的2范数与正则参数乘以x的2范数之和
*
* @param ne a [[NormalEquation]] instance that contains AtA, Atb, and n (number of instances)
* @param lambda regularization constant
* @return the solution x
*/
override def solve(ne: NormalEquation, lambda: Double): Array[Float] = {//输入正规方程实例,正则化参数,返回x的解,x是数组
val k = ne.k//k是矩阵的阶
// Add scaled lambda to the diagonals of AtA.//在AtA的对角线上加正则化参数
var i = 0
var j = 2
while (i < ne.triK) {//i,j这样累加完全是因为存储结构,即ata是按列存储的上三角矩阵向量,ata( 1 )代表a( 1, 1 ), ata( 2 )代表a( 1, 2 ),ata( 3 ) 代表a( 2, 2 )
ne.ata(i) += lambda
i += j
j += 1
}
CholeskyDecomposition.solve(ne.ata, ne.atb)//进行Cholesky分解,分解后ne.atb更新为求解值,也就是说ne.atb乘以ne.atb的转置就是矩阵A
val x = new Array[Float](k)
i = 0
while (i < k) {
x(i) = ne.atb(i).toFloat//把ne.atb拷贝到x
i += 1
}
ne.reset()//清0
x//返回x
}
}
再看下继承LeastSquaresNESolver的另外一种形式NNLSSolver,注意这里的ata是NNLSSolver类自己的成员ata,是一个完整的矩阵的向量表示,而不是之前的上三角矩阵的向量表示!
private[recommendation] class NNLSSolver extends LeastSquaresNESolver {
private var rank: Int = -1//初始化矩阵因子的阶
private var workspace: NNLS.Workspace = _//这里会用到NNLS对象,我们先看下这个对象
/**
* Object used to solve nonnegative least squares problems using a modified//用修改过的梯度映射法解决非负最小平方问题
* projected gradient method.
*/
private[spark] object NNLS {
class Workspace(val n: Int) {//搞一个工作空间的类
val scratch = new Array[Double](n)//英文翻译是抓,存放ata*dir,由于dir是梯度,所以有一点一点抓,爬的意思
val grad = new Array[Double](n)//梯度数组
val x = new Array[Double](n)//自变量数组
val dir = new Array[Double](n)//方向数组
val lastDir = new Array[Double](n)//上次迭代方向数组
val res = new Array[Double](n)//中间结果数组
def wipe(): Unit = {//各种清0
ju.Arrays.fill(scratch, 0.0)
ju.Arrays.fill(grad, 0.0)
ju.Arrays.fill(x, 0.0)
ju.Arrays.fill(dir, 0.0)
ju.Arrays.fill(lastDir, 0.0)
ju.Arrays.fill(res, 0.0)
}
}
def createWorkspace(n: Int): Workspace = {//生成工作空间
new Workspace(n)
}
/**
* Solve a least squares problem, possibly with nonnegativity constraints, by a modified//用修正的映射梯度方法解决可能有非负限制最小平方问题
* projected gradient method. That is, find x minimising ||Ax - b||_2 given A^T A and A^T b.//给定 A^T A and A^T b求出使得||Ax - b||2范数最小的x
*
* We solve the problem
* min_x 1/2 x^T ata x^T - x^T atb//最小平方问题等价于二次规划 1/2 x^T ata x^T - x^T atb,且x>=0
* subject to x >= 0
*
* The method used is similar to one described by Polyak (B. T. Polyak, The conjugate gradient
* method in extremal problems, Zh. Vychisl. Mat. Mat. Fiz. 9(4)(1969), pp. 94-112) for bound-
* constrained nonlinear programming. Polyak unconditionally uses a conjugate gradient//这里是背景介绍,Polyak 无条件的使用成对梯度方向直到迭代对结果没有明显改变
* direction, however, while this method only uses a conjugate gradient direction if the last
* iteration did not cause a previously-inactive constraint to become active.
*/
def solve(ata: Array[Double], atb: Array[Double], ws: Workspace): Array[Double] = {//复写solve方法
ws.wipe()//工作空间清0
val n = atb.length//求矩阵阶
val scratch = ws.scratch//拷贝scratch成员
// find the optimal unconstrained step//找到最优无约束的步长,这个步长的意思是在当前的梯度下,残差ata*x-atb和敏感度ata*dir的比值,如果比值很大,步子也大,如果比值很小,步子也小,这样初始阶段逼近很快,越往后逼近越精确
def steplen(dir: Array[Double], res: Array[Double]): Double = {//步长函数,传入方向参数,中间结果参数
val top = blas.ddot(n, dir, 1, res, 1)//计算方向和结果的点积
blas.dgemv("N", n, n, 1.0, ata, n, dir, 1, 0.0, scratch, 1)//实现 y := alpha*A*x + beta*y,其中A是n*n的矩阵,alpha是1.0,ata代表矩阵A,n代表A的行数,dir就是x,1是x的步长,beta是0.0,y是scratch,步长为1,最终就是scratch=ata*dir,无返回但是更新了scratch,擦,真累!
// Push the denominator upward very slightly to avoid infinities and silliness//给分母加一个小数避免错误
top / (blas.ddot(n, scratch, 1, dir, 1) + 1e-20)//这里相当于dir点积res/ata*dir点积dir
}
// stopping condition//判停条件
def stop(step: Double, ndir: Double, nx: Double): Boolean = {//判停函数,如果step为空或者step很小甚至为负或者step过大或者梯度相对较小或者绝对较小
((step.isNaN) // NaN
|| (step < 1e-7) // too small or negative
|| (step > 1e40) // too small; almost certainly numerical problems
|| (ndir < 1e-12 * nx) // gradient relatively too small
|| (ndir < 1e-32) // gradient absolutely too small; numerical issues may lurk
)
}
val grad = ws.grad//各种赋值,好累
val x = ws.x
val dir = ws.dir
val lastDir = ws.lastDir
val res = ws.res
val iterMax = math.max(400, 20 * n)//取400和20*n的最大值作为迭代最大次数
var lastNorm = 0.0//上次迭代的梯度点积
var iterno = 0//当前迭代次数
var lastWall = 0 // Last iteration when we hit a bound constraint.//撞到边界前的迭代次数
var i = 0
while (iterno < iterMax) {//迭代
// find the residual//计算残差res=ata*x-atb,把res作为梯度
blas.dgemv("N", n, n, 1.0, ata, n, x, 1, 0.0, res, 1)//res=ata*x
blas.daxpy(n, -1.0, atb, 1, res, 1)//res:=-atb+res,把-atb更新到res
blas.dcopy(n, res, 1, grad, 1)//grad = res//res作为梯度
// project the gradient//梯度映射,这里的意思是x=0时,也就是初始状态和撞墙以后,我们希望x是从负梯度开始逼近,保持初始状态和撞墙以后梯度的优化方向是一致的
i = 0
while (i < n) {
if (grad(i) > 0.0 && x(i) == 0.0) {//如果梯度>0且当前值x=0,梯度要置为0,也就是说x=0时负梯度可以起作用,因为初始的时候x都是0,只有atb里面的正值起作用
grad(i) = 0.0
}
i = i + 1
}
val ngrad = blas.ddot(n, grad, 1, grad, 1)//梯度grad求点积
blas.dcopy(n, grad, 1, dir, 1)//把梯度赋值给dir
// use a CG direction under certain conditions//在某些条件下使用CGdirection
var step = steplen(grad, res)//根据步长函数计算步长,这里grad和res的唯一区别就是grad>0且x=0时,修正grad为0
var ndir = 0.0
val nx = blas.ddot(n, x, 1, x, 1)//x点积x
if (iterno > lastWall + 1) {//如果x某个分量已经撞到边界,用dir:=lastDir*(ngrad / lastNorm)+dir更新当前梯度,用更新的梯度求步长,如果步长被拒绝了,还保留之前的dir和ndir,这里iterno > lastWall + 1的原理是如果x某个分量已经撞到边界,那么下一次更新的时候使用正常梯度即iterno = lastWall + 1,再下一次由于梯度乘以了alpha这个点积变化率,梯度会缩小,使得逼近更慢更精细
val alpha = ngrad / lastNorm//梯度点积比上次梯度点积,也就是变化率
blas.daxpy(n, alpha, lastDir, 1, dir, 1)//把上次方向乘以变化率赋值给当前方向
val dstep = steplen(dir, res)//dir和res再求步长
ndir = blas.ddot(n, dir, 1, dir, 1)dir点积dir
if (stop(dstep, ndir, nx)) {//通过步长,dir自身的点积ndir,x的点积nx判断是否停止,这里相当于先在dir基础上更新,如果dir判停就说明步子大了,回到dir重新调整
// reject the CG step if it could lead to premature termination//如果参数越过边界拒绝CG步长
blas.dcopy(n, grad, 1, dir, 1)//把梯度拷贝给dir
ndir = blas.ddot(n, dir, 1, dir, 1)//dir即梯度点积
} else {//如果参数没越界把赋值新的步长
step = dstep
}
} else {//如果迭代还没有撞到边界
ndir = blas.ddot(n, dir, 1, dir, 1)//dir自身进行点积
}
// terminate?
if (stop(step, ndir, nx)) {//如果判停返回x
return x.clone
}
// don't run through the walls//不越界
i = 0
while (i < n) {//遍历x,如果步长与dir的乘积大于x,把步长调小,可见这是梯度和步长同时调整的,两者共同作用更新x
if (step * dir(i) > x(i)) {
step = x(i) / dir(i)
}
i = i + 1
}
// take the step//迈步,一定要记住x有非负限制,dir为负的时候x是正数,dir为正时x会减小,小到一定程度会被置0,也就是撞墙,这样做保证x各分量能同步增长,而不是某个分量过大,变相实现了正则化
i = 0
while (i < n) {//遍历x,如果步长乘以dir与x差异很小,把x赋0,把迭代次数赋给撞到边界前的迭代次数,否则x减去步长和dir的乘积,所以撞边界的定义是与x非常接近
if (step * dir(i) > x(i) * (1 - 1e-14)) {
x(i) = 0
lastWall = iterno
} else {
x(i) -= step * dir(i)
}
i = i + 1
}
iterno = iterno + 1//迭代次数+1
blas.dcopy(n, dir, 1, lastDir, 1)把dir赋给上次dir
lastNorm = ngrad//梯度的点积赋值给上次点积
}
x.clone//返回求解的x
}
}
回到NNLSSolver
private var ata: Array[Double] = _//声明ata
private var initialized: Boolean = false//默认不初始化
private def initialize(rank: Int): Unit = {//初始化函数,参数是矩阵因子的阶
if (!initialized) {//默认执行
this.rank = rank
workspace = NNLS.createWorkspace(rank)//初始化工作空间
ata = new Array[Double](rank * rank)//这里明显不是原来的ata上三角矩阵了,而是一个rank*rank的矩阵
initialized = true
} else {
require(this.rank == rank)
}
}
/**
* Solves a nonnegative least squares problem with L2 regularizatin://通过l2正则化解决非负最小平方问题
*
* min_x_ norm(A x - b)^2^ + lambda * n * norm(x)^2^//最小化Ax-b的二范数+lambda*n*x的二范数,lambda是正则化常亮,这个比乔氏分解多了一个n
* subject to x >= 0//约束x>=0
*/
override def solve(ne: NormalEquation, lambda: Double): Array[Float] = {//传入正规方程和lambda求解
val rank = ne.k//把阶传给rank
initialize(rank)//初始化
fillAtA(ne.ata, lambda)//调用下面的函数,根据ata计算完整的对称矩阵并存储
val x = NNLS.solve(ata, ne.atb, workspace)//传入ata,atb,工作空间求x最优解
ne.reset()//把ata,atb都清0,就是每次求完最优解都清0
x.map(x => x.toFloat)//把x转为单精度返回
}
/**
* Given a triangular matrix in the order of fillXtX above, compute the full symmetric square//根据ata计算完整的对称矩阵并存储,square这里是方阵的意思,不是平方,这又一次提醒了我们这里ata是完整矩阵不是上三角矩阵
* matrix that it represents, storing it into destMatrix.
*/
private def fillAtA(triAtA: Array[Double], lambda: Double) {//传入ata,lamdba
var i = 0//i代表行
var pos = 0//代表向量的索引,后面只有pos += 1,递增读取传入的triAtA(pos)并赋值给a
var a = 0.0
while (i < rank) {//循环阶rank
var j = 0//j代表列
while (j <= i) {//由于上三角矩阵,列总是小于等于行
a = triAtA(pos)
ata(i * rank + j) = a//由于是对称矩阵,从向量读一个值可以赋值给对称位置的两个值
ata(j * rank + i) = a
pos += 1//向量索引递增
j += 1//列递增
}
ata(i * rank + i) += lambda//对角线再加一个lambda
i += 1//行递增
}
}
}
这篇后面主要介绍了LeastSquaresNESolver的两种实现,一种是CholeskySolver ,另一种是NNLSSolver,这两种实现的求解方法solve都会用到正规方程NormalEquation这个类作为参数,下一篇我们继续