【机器学习】Softmax Regression算法原理与java实现
Logistic Regression算法是线性二分类算法,Softmax Regression算法是Logistic Regression算法在多分类问题上的推广,其中任意两个类别的样本是线性可分的(参考资料1)。
1、Softmax Regression算法原理
1.1、样本概率
假设样本
{
x
1
,
x
2
,
⋯
 
,
x
n
}
\left\{ {{x_1},{x_2}, \cdots ,{x_n}} \right\}
{x1,x2,⋯,xn}的个数为
n
n
n,样本特征个数为
m
m
m,样本标签类别为
j
j
j。为了使样本映射到
j
j
j个类别中,则权重矩阵(
W
=
[
w
1
,
w
2
,
⋯
 
,
w
j
]
W = \left[ {{w_1},{w_2}, \cdots ,{w_j}} \right]
W=[w1,w2,⋯,wj])的维度为
m
×
j
m \times j
m×j。
样本
x
i
x_i
xi属于类别
k
(
k
∈
{
0
,
1
,
⋯
 
,
j
−
1
}
)
k(k \in \left\{ {0,1, \cdots ,j - 1} \right\})
k(k∈{0,1,⋯,j−1})的概率为:
p ( y i = k ∣ x i ; W ) = e w k T x i ∑ l = 1 j e w l T x i p\left( {{y_i} = k\left| {{x_i}} \right.;W} \right) = \frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }} p(yi=k∣xi;W)=l=1∑jewlTxiewkTxi
将样本属于所有类别的概率合并后样本的概率为:
p
(
y
i
∣
x
i
;
W
)
=
∏
l
=
1
k
(
e
w
k
T
x
i
∑
l
=
1
j
e
w
l
T
x
i
)
I
{
y
i
=
l
}
p\left( {{y_i}\left| {{x_i};W} \right.} \right) = \prod\limits_{l = 1}^k {{{\left( {\frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }}} \right)}^{I\left\{ {{y_i} = l} \right\}}}}
p(yi∣xi;W)=l=1∏k⎝⎛l=1∑jewlTxiewkTxi⎠⎞I{yi=l}
其中,当样本 x i x_i xi属于类别 l l l时 I { y i = l } = 1 {I\left\{ {{y_i} = l} \right\}}=1 I{yi=l}=1,否则 I { y i = l } = 0 {I\left\{ {{y_i} = l} \right\}}=0 I{yi=l}=0。
1.2、损失函数
如本人之前博文中描述的那样,基于概率的机器学习算法的损失函数为负的log似然函数。
似然函数如下:
L W = ∏ i = 1 m p ( y i ∣ x i ; W ) = ∏ i = 1 m ∏ l = 1 k ( e w k T x i ∑ l = 1 j e w l T x i ) I { y i = l } {L_W} = \prod\limits_{i = 1}^m {p\left( {{y_i}\left| {{x_i};W} \right.} \right) = } \prod\limits_{i = 1}^m {\prod\limits_{l = 1}^k {{{\left( {\frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }}} \right)}^{I\left\{ {{y_i} = l} \right\}}}} } LW=i=1∏mp(yi∣xi;W)=i=1∏ml=1∏k⎝⎛l=1∑jewlTxiewkTxi⎠⎞I{yi=l}
损失函数为:
l W = − 1 m [ ∑ i = 1 m ∑ l = 1 k I { y i = l } log ( e w k T x i ∑ l = 1 j e w l T x i ) ] {l_W} = - \frac{1}{m}\left[ {\sum\limits_{i = 1}^m {\sum\limits_{l = 1}^k {I\left\{ {{y_i} = l} \right\}} } \log \left( {\frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }}} \right)} \right] lW=−m1⎣⎡i=1∑ml=1∑kI{yi=l}log⎝⎛l=1∑jewlTxiewkTxi⎠⎞⎦⎤
1.3、梯度下降法训练模型
权重矩阵的第
k
k
k个分量
w
k
w_k
wk的梯度方向为
∂
l
W
∂
w
k
\frac{{\partial {l_W}}}{{\partial {w_k}}}
∂wk∂lW。
当
y
i
=
k
y_i=k
yi=k时:
∂ l W ∂ w k = − 1 m ∑ i = 1 m [ ∑ l = 1 j e w l T x i − e w k T x i ∑ l = 1 j e w l T x i ⋅ x i ] \frac{{\partial {l_W}}}{{\partial {w_k}}} = - \frac{1}{m}\sum\limits_{i = 1}^m {\left[ {\frac{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} - {e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }} \cdot {x_i}} \right]} ∂wk∂lW=−m1i=1∑m⎣⎡l=1∑jewlTxil=1∑jewlTxi−ewkTxi⋅xi⎦⎤
当 y ≠ k y \ne k y̸=k时:
∂ l W ∂ w k = − 1 m ∑ i = 1 m [ − e w k T x i ∑ l = 1 j e w l T x i ⋅ x i ] \frac{{\partial {l_W}}}{{\partial {w_k}}} = - \frac{1}{m}\sum\limits_{i = 1}^m {\left[ {\frac{{ - {e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }} \cdot {x_i}} \right]} ∂wk∂lW=−m1i=1∑m⎣⎡l=1∑jewlTxi−ewkTxi⋅xi⎦⎤
将两种情况合并:
∂ l W ∂ w k = − 1 m ∑ i = 1 m [ x i ⋅ ( I { y i = k } − p ( y i = k ∣ x i ; W ) ) ] \frac{{\partial {l_W}}}{{\partial {w_k}}} = - \frac{1}{m}\sum\limits_{i = 1}^m {\left[ {{x_i} \cdot \left( {I\left\{ {{y_i} = k} \right\} - p\left( {{y_i} = k\left| {{x_i}} \right.;W} \right)} \right)} \right]} ∂wk∂lW=−m1i=1∑m[xi⋅(I{yi=k}−p(yi=k∣xi;W))]
2、java实现
完整java代码与样本地址:https://github.com/shiluqiang/Softmax-Regression-java
首先,导入数据样本和标签。
import java.util.*;
import java.io.*;
public class LoadData {
//导入样本特征
public static double[][] Loadfeature(String filename) throws IOException{
File f = new File(filename);
FileInputStream fip = new FileInputStream(f);
// 构建FileInputStream对象
InputStreamReader reader = new InputStreamReader(fip,"UTF-8");
// 构建InputStreamReader对象
StringBuffer sb = new StringBuffer();
while(reader.ready()) {
sb.append((char) reader.read());
}
reader.close();
fip.close();
//将读入的数据流转换为字符串
String sb1 = sb.toString();
//按行将字符串分割,计算二维数组行数
String [] a = sb1.split("\n");
int n = a.length;
System.out.println("二维数组行数为:" + n);
//计算二维数组列数
String [] a0 = a[0].split("\t");
int m = a0.length;
System.out.println("二维数组列数为:" + m);
double [][] feature = new double[n][m];
for (int i = 0; i < n; i ++) {
String [] tmp = a[i].split("\t");
for(int j = 0; j < m; j ++) {
if (j == m-1) {
feature[i][j] = (double) 1;
}
else {
feature[i][j] = Double.parseDouble(tmp[j]);
}
}
}
return feature;
}
//导入样本标签
public static double[] LoadLabel(String filename) throws IOException{
File f = new File(filename);
FileInputStream fip = new FileInputStream(f);
// 构建FileInputStream对象
InputStreamReader reader = new InputStreamReader(fip,"UTF-8");
// 构建InputStreamReader对象,编码与写入相同
StringBuffer sb = new StringBuffer();
while(reader.ready()) {
sb.append((char) reader.read());
}
reader.close();
fip.close();
//将读入的数据流转换为字符串
String sb1 = sb.toString();
//按行将字符串分割,计算二维数组行数
String [] a = sb1.split("\n");
int n = a.length;
System.out.println("二维数组行数为:" + n);
//计算二维数组列数
String [] a0 = a[0].split("\t");
int m = a0.length;
System.out.println("二维数组列数为:" + m);
double [] Label = new double[n];
for (int i = 0; i < n; i ++) {
String [] tmp = a[i].split("\t");
Label[i] = Double.parseDouble(tmp[m-1]);
}
return Label;
}
public static int LabelNum(double [] Label) {
int n = Label.length;
double [] LabelTmp = new double [n];
System.arraycopy(Label, 0, LabelTmp, 0, n);
int labelNum = 1;
Arrays.sort(LabelTmp);
for(int i = 1; i < n; i ++) {
if (LabelTmp[i] != LabelTmp[i-1]) {
labelNum ++;
}
}
return labelNum;
}
}
然后,利用梯度下降算法优化模型。
public class SRtrainGradientDescent {
int paraNum; //权重参数的个数
double rate; //学习率
int samNum; //样本个数
double [][] feature; //样本特征矩阵
double [] Label;//样本标签
int maxCycle; //最大迭代次数
int labelNum; //标签个数
//初始化构造器
public SRtrainGradientDescent(double [][] feature, double [] Label, int paraNum,double rate, int samNum,int maxCycle,int labelNum) {
this.feature = feature;
this.Label = Label;
this.maxCycle = maxCycle;
this.paraNum = paraNum;
this.rate = rate;
this.samNum = samNum;
this.labelNum = labelNum;
}
// 权值矩阵初始化
public double [][] ParaInitialize(int paraNum,int labelNum) {
double [][] W = new double[paraNum][labelNum];
for (int i = 0; i < paraNum; i ++) {
for (int j = 0; j < labelNum; j ++) {
W[i][j] = 1.0;
}
}
return W;
}
//计算假设函数的分子部分
public double [][] err(double[][] W, double [][] feature){
double [][] errMatrix = new double[feature.length][W[0].length];
for (int i = 0; i < feature.length; i ++) {
for (int j = 0; j < W[0].length; j ++) {
double tmp = 0;
for (int n = 0; n < W.length; n ++) {
tmp = tmp + feature[i][n] * W[n][j];
}
errMatrix[i][j] = Math.exp(tmp);
}
}
return errMatrix;
}
//计算假设函数的分母部分
public double [] errSum(double [][] errMatrix) {
double [] errsum = new double[errMatrix.length];
for (int i = 0; i < errMatrix.length; i ++) {
double tmp = 0;
for (int j = 0; j < errMatrix[0].length; j ++) {
tmp = tmp - errMatrix[i][j];
}
errsum[i] = tmp;
}
return errsum;
}
//计算假设函数的负数矩阵
public double [][] errFunction(double [][] errMatrix, double [] errsum){
double [][] errResult = new double [errMatrix.length][errMatrix[0].length];
for (int i = 0; i < errMatrix.length; i ++) {
for (int j = 0; j < errMatrix[0].length; j ++) {
errResult[i][j] = errMatrix[i][j] / errsum[i];
}
}
return errResult;
}
//计算预测损失函数值
public double cost(double [] Label,double [][] errMatrix, double [] errsum,int samNum) {
double sum_cost = 0;
for(int i = 0; i < samNum; i ++) {
int m = (int) Label[i];
if ((errMatrix[i][m] / (- errsum[i])) > 0) {
sum_cost -= Math.log(errMatrix[i][m] / (- errsum[i]));
}
else {
sum_cost -= 0;
}
}
return sum_cost / samNum;
}
public double [][] Update(double [][] feature, double[] Label, int maxCycle, double rate,int paraNum,int labelNum, int samNum){
//初始化权重矩阵
double [][] weights = ParaInitialize(paraNum,labelNum);
// 循环迭代优化权重矩阵
for(int i = 0; i < maxCycle; i ++) {
//假设函数的分子部分
double [][] errMatrix = err(weights,feature);
//假设函数的分母部分的负数
double [] errsum = errSum(errMatrix);
if (i % 10 == 0) {
double cost = cost(Label,errMatrix,errsum,samNum);
System.out.println("第" + i + "次迭代的损失函数值为:" + cost);
}
//假设函数的负数矩阵
double [][] errResult = errFunction(errMatrix,errsum);
for (int j = 0; j < samNum; j ++) {
int m = (int) Label[j];
errResult[j][m] += 1;
}
// 计算权重矩阵中每个权重参数的梯度方向
double [][] delt_weights = new double[paraNum][labelNum];
for (int iter1 = 0; iter1 < paraNum; iter1 ++) {
for (int iter2 = 0; iter2 < labelNum; iter2 ++) {
double tmp = 0;
for (int iter3 = 0; iter3 < samNum; iter3 ++) {
tmp = tmp + feature[iter3][iter1] * errResult[iter3][iter2];
}
delt_weights[iter1][iter2] = tmp / samNum;
}
}
for (int iter1 = 0; iter1 < paraNum; iter1 ++) {
for (int iter2 = 0; iter2 < labelNum; iter2 ++) {
weights[iter1][iter2] = weights[iter1][iter2] + rate * delt_weights[iter1][iter2];
} }
}
return weights;
}
}
其次,模型测试。
public class SRTest {
//从矩阵的一行中找到最大元素对应的指针
public static int MaxSearch(double [] array) {
int pointer = 0;
double tmp = 0;
for (int j = 0; j < array.length; j ++) {
if (array[j] > tmp) {
tmp = array[j];
pointer = j;
}
}
return pointer;
}
//计算预测结果
public static double [] SRtest(int labelNum,int samNum,int paraNum,double [][] feature,double [][] weights) {
double [][] pre_results = new double [samNum][labelNum];
for (int i = 0; i < samNum; i ++) {
for (int j = 0; j < labelNum; j ++) {
double tmp = 0;
for (int n = 0; n < paraNum; n ++) {
tmp += feature[i][n] * weights[n][j];
}
pre_results[i][j] = tmp;
}
}
double [] results = new double [samNum];
for (int m = 0; m < samNum; m ++) {
results[m] = MaxSearch(pre_results[m]);
}
return results;
}
}
再其次,保存模型权重矩阵与预测结果。
import java.io.*;
public class SaveModelResults {
public static void savemodel(String filename, double [][] W) throws IOException{
File f = new File(filename);
// 构建FileOutputStream对象
FileOutputStream fip = new FileOutputStream(f);
// 构建OutputStreamWriter对象
OutputStreamWriter writer = new OutputStreamWriter(fip,"UTF-8");
//计算模型矩阵的元素个数
int n = W.length;
int m = W[0].length;
StringBuffer sb = new StringBuffer();
for (int j = 0; j < n-1; j ++) {
for (int i = 0; i < m-1; i ++) {
sb.append(String.valueOf(W[j][i]));
sb.append("\t");
}
sb.append(String.valueOf(W[j][m-1]));
sb.append("\n");
}
for (int i = 0; i < m-1; i ++) {
sb.append(String.valueOf(W[n-1][i]));
sb.append("\t");
}
sb.append(String.valueOf(W[n-1][m-1]));
String sb1 = sb.toString();
writer.write(sb1);
writer.close();
fip.close();
}
public static void saveresults(String filename, double [] results) throws IOException{
File f = new File(filename);
// 构建FileOutputStream对象
FileOutputStream fip = new FileOutputStream(f);
// 构建OutputStreamWriter对象
OutputStreamWriter writer = new OutputStreamWriter(fip,"UTF-8");
//计算的预测结果中元素个数
int n = results.length;
StringBuffer sb = new StringBuffer();
for (int i = 0; i < n; i ++) {
sb.append(results[i]);
sb.append("\n");
}
String sb1 = sb.toString();
writer.write(sb1);
writer.close();
fip.close();
}
}
主类:
import java.io.IOException;
public class SRMain {
public static void main(String[] args) throws IOException{
// filename
String filename = "SoftInput.txt";
// 导入样本特征和标签
double [][] feature = LoadData.Loadfeature(filename);
double [] Label = LoadData.LoadLabel(filename);
int labelNum = LoadData.LabelNum(Label);
// 参数设置
int samNum = feature.length;
int paraNum = feature[0].length;
double rate = 0.04;
int maxCycle = 10000;
// SR模型训练
SRtrainGradientDescent SR = new SRtrainGradientDescent(feature,Label,paraNum,rate,samNum,maxCycle,labelNum);
double [][] weights = SR.Update(feature, Label, maxCycle, rate, paraNum, labelNum, samNum);
//保存模型
String model_path = "wrights.txt";
SaveModelResults.savemodel(model_path, weights);
//模型测试
double [] results = SRTest.SRtest(labelNum, samNum, paraNum, feature, weights);
String results_path = "results.txt";
SaveModelResults.saveresults(results_path, results);
}
}
参考资料
1、《Python机器学习实战》