【机器学习】Softmax Regression算法原理与java实现

26 篇文章 23 订阅
5 篇文章 0 订阅


Logistic Regression算法是线性二分类算法,Softmax Regression算法是Logistic Regression算法在多分类问题上的推广,其中任意两个类别的样本是线性可分的(参考资料1)。

1、Softmax Regression算法原理

1.1、样本概率

假设样本 { x 1 , x 2 , ⋯   , x n } \left\{ {{x_1},{x_2}, \cdots ,{x_n}} \right\} {x1,x2,,xn}的个数为 n n n,样本特征个数为 m m m,样本标签类别为 j j j。为了使样本映射到 j j j个类别中,则权重矩阵( W = [ w 1 , w 2 , ⋯   , w j ] W = \left[ {{w_1},{w_2}, \cdots ,{w_j}} \right] W=[w1,w2,,wj])的维度为 m × j m \times j m×j
样本 x i x_i xi属于类别 k ( k ∈ { 0 , 1 , ⋯   , j − 1 } ) k(k \in \left\{ {0,1, \cdots ,j - 1} \right\}) kk{0,1,,j1}的概率为:

p ( y i = k ∣ x i ; W ) = e w k T x i ∑ l = 1 j e w l T x i p\left( {{y_i} = k\left| {{x_i}} \right.;W} \right) = \frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }} p(yi=kxi;W)=l=1jewlTxiewkTxi

将样本属于所有类别的概率合并后样本的概率为:
p ( y i ∣ x i ; W ) = ∏ l = 1 k ( e w k T x i ∑ l = 1 j e w l T x i ) I { y i = l } p\left( {{y_i}\left| {{x_i};W} \right.} \right) = \prod\limits_{l = 1}^k {{{\left( {\frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }}} \right)}^{I\left\{ {{y_i} = l} \right\}}}} p(yixi;W)=l=1kl=1jewlTxiewkTxiI{yi=l}

其中,当样本 x i x_i xi属于类别 l l l I { y i = l } = 1 {I\left\{ {{y_i} = l} \right\}}=1 I{yi=l}=1,否则 I { y i = l } = 0 {I\left\{ {{y_i} = l} \right\}}=0 I{yi=l}=0

1.2、损失函数

本人之前博文中描述的那样,基于概率的机器学习算法的损失函数为负的log似然函数。
似然函数如下:

L W = ∏ i = 1 m p ( y i ∣ x i ; W ) = ∏ i = 1 m ∏ l = 1 k ( e w k T x i ∑ l = 1 j e w l T x i ) I { y i = l } {L_W} = \prod\limits_{i = 1}^m {p\left( {{y_i}\left| {{x_i};W} \right.} \right) = } \prod\limits_{i = 1}^m {\prod\limits_{l = 1}^k {{{\left( {\frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }}} \right)}^{I\left\{ {{y_i} = l} \right\}}}} } LW=i=1mp(yixi;W)=i=1ml=1kl=1jewlTxiewkTxiI{yi=l}

损失函数为:

l W = − 1 m [ ∑ i = 1 m ∑ l = 1 k I { y i = l } log ⁡ ( e w k T x i ∑ l = 1 j e w l T x i ) ] {l_W} = - \frac{1}{m}\left[ {\sum\limits_{i = 1}^m {\sum\limits_{l = 1}^k {I\left\{ {{y_i} = l} \right\}} } \log \left( {\frac{{{e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }}} \right)} \right] lW=m1i=1ml=1kI{yi=l}logl=1jewlTxiewkTxi

1.3、梯度下降法训练模型

权重矩阵的第 k k k个分量 w k w_k wk的梯度方向为 ∂ l W ∂ w k \frac{{\partial {l_W}}}{{\partial {w_k}}} wklW
y i = k y_i=k yi=k时:

∂ l W ∂ w k = − 1 m ∑ i = 1 m [ ∑ l = 1 j e w l T x i − e w k T x i ∑ l = 1 j e w l T x i ⋅ x i ] \frac{{\partial {l_W}}}{{\partial {w_k}}} = - \frac{1}{m}\sum\limits_{i = 1}^m {\left[ {\frac{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} - {e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }} \cdot {x_i}} \right]} wklW=m1i=1ml=1jewlTxil=1jewlTxiewkTxixi

y ≠ k y \ne k y̸=k时:

∂ l W ∂ w k = − 1 m ∑ i = 1 m [ − e w k T x i ∑ l = 1 j e w l T x i ⋅ x i ] \frac{{\partial {l_W}}}{{\partial {w_k}}} = - \frac{1}{m}\sum\limits_{i = 1}^m {\left[ {\frac{{ - {e^{w_k^T{x_i}}}}}{{\sum\limits_{l = 1}^j {{e^{w_l^T{x_i}}}} }} \cdot {x_i}} \right]} wklW=m1i=1ml=1jewlTxiewkTxixi

将两种情况合并:

∂ l W ∂ w k = − 1 m ∑ i = 1 m [ x i ⋅ ( I { y i = k } − p ( y i = k ∣ x i ; W ) ) ] \frac{{\partial {l_W}}}{{\partial {w_k}}} = - \frac{1}{m}\sum\limits_{i = 1}^m {\left[ {{x_i} \cdot \left( {I\left\{ {{y_i} = k} \right\} - p\left( {{y_i} = k\left| {{x_i}} \right.;W} \right)} \right)} \right]} wklW=m1i=1m[xi(I{yi=k}p(yi=kxi;W))]

2、java实现

完整java代码与样本地址:https://github.com/shiluqiang/Softmax-Regression-java
首先,导入数据样本和标签。

import java.util.*;
import java.io.*;
public class LoadData {
	//导入样本特征
		public static double[][] Loadfeature(String filename) throws IOException{
			File f = new File(filename);
			FileInputStream fip = new FileInputStream(f);
	        // 构建FileInputStream对象
	        InputStreamReader reader = new InputStreamReader(fip,"UTF-8");
	        // 构建InputStreamReader对象
	        StringBuffer sb = new StringBuffer();
	        while(reader.ready()) {
	        	sb.append((char) reader.read());
	        }
	        reader.close();
	        fip.close();
	        //将读入的数据流转换为字符串
	        String sb1 = sb.toString();
	        //按行将字符串分割,计算二维数组行数
	        String [] a = sb1.split("\n");
	        int n = a.length;
	        System.out.println("二维数组行数为:" + n);
	        //计算二维数组列数
	        String [] a0 = a[0].split("\t");
	        int m = a0.length;
	        System.out.println("二维数组列数为:" + m);
	        
	        double [][] feature = new double[n][m];
	        for (int i = 0; i < n; i ++) {
	        	String [] tmp = a[i].split("\t");
	        	for(int j = 0; j < m; j ++) {
	        		if (j == m-1) {
	        			feature[i][j] = (double) 1;
	        		}
	        		else {
	        			feature[i][j] = Double.parseDouble(tmp[j]);
	        		}       		
	        	}       	
	        }
	        return feature;		
		}
		//导入样本标签
		public static double[] LoadLabel(String filename) throws IOException{
			File f = new File(filename);
			FileInputStream fip = new FileInputStream(f);
	        // 构建FileInputStream对象
	        InputStreamReader reader = new InputStreamReader(fip,"UTF-8");
	        // 构建InputStreamReader对象,编码与写入相同
	        StringBuffer sb = new StringBuffer();
	        while(reader.ready()) {
	        	sb.append((char) reader.read());
	        }
	        reader.close();
	        fip.close();
	        //将读入的数据流转换为字符串
	        String sb1 = sb.toString();
	        //按行将字符串分割,计算二维数组行数
	        String [] a = sb1.split("\n");
	        int n = a.length;
	        System.out.println("二维数组行数为:" + n);
	        //计算二维数组列数
	        String [] a0 = a[0].split("\t");
	        int m = a0.length;
	        System.out.println("二维数组列数为:" + m);
	        
	        double [] Label = new double[n];
	        for (int i = 0; i < n; i ++) {
	        	String [] tmp = a[i].split("\t");
	        	Label[i] = Double.parseDouble(tmp[m-1]);        	
	        }
	        return Label;		
		}
		public static int LabelNum(double [] Label) {
			int n = Label.length;
			double [] LabelTmp = new double [n];
			System.arraycopy(Label, 0, LabelTmp, 0, n); 
			int labelNum = 1;
			Arrays.sort(LabelTmp);
			for(int i = 1; i < n; i ++) {
				if (LabelTmp[i] != LabelTmp[i-1]) {
					labelNum ++;
				}
			}
			return labelNum;
		}

}

然后,利用梯度下降算法优化模型。

public class SRtrainGradientDescent {
	int paraNum; //权重参数的个数
    double rate; //学习率
    int samNum; //样本个数
    double [][] feature; //样本特征矩阵
    double [] Label;//样本标签
    int maxCycle; //最大迭代次数
    int labelNum; //标签个数
    
    //初始化构造器
    public SRtrainGradientDescent(double [][] feature, double [] Label, int paraNum,double rate, int samNum,int maxCycle,int labelNum) {
    	this.feature = feature;
    	this.Label = Label;
    	this.maxCycle = maxCycle;
    	this.paraNum = paraNum;
    	this.rate = rate;
    	this.samNum = samNum; 
    	this.labelNum = labelNum;
    }
    
    // 权值矩阵初始化
    public double [][] ParaInitialize(int paraNum,int labelNum) {
    	double [][] W = new double[paraNum][labelNum];
    	for (int i = 0; i < paraNum; i ++) {
    		for (int j = 0; j < labelNum; j ++) {
    			W[i][j] =  1.0;
    		}   		
    	}
    	return W;   	
    }
    
    //计算假设函数的分子部分
    public double [][] err(double[][] W, double [][] feature){
    	double [][] errMatrix = new double[feature.length][W[0].length];
    	for (int i = 0; i < feature.length; i ++) {
    		for (int j = 0; j < W[0].length; j ++) {
    			double tmp = 0;
    			for (int n = 0; n < W.length; n ++) {
    				tmp = tmp + feature[i][n] * W[n][j];
    			}
    			errMatrix[i][j] = Math.exp(tmp);
    		}
    	}
    	return errMatrix;
    }
    //计算假设函数的分母部分
    public double [] errSum(double [][] errMatrix) {
    	double [] errsum = new double[errMatrix.length];
    	for (int i = 0; i < errMatrix.length; i ++) {
    		double tmp = 0;
    		for (int j = 0; j < errMatrix[0].length; j ++) {
    			tmp = tmp - errMatrix[i][j];
    		}
    		errsum[i] = tmp;
    	}
    	return errsum;
    }
    
    //计算假设函数的负数矩阵
    public double [][] errFunction(double [][] errMatrix, double [] errsum){
    	double [][] errResult = new double [errMatrix.length][errMatrix[0].length];
    	for (int i = 0; i < errMatrix.length; i ++) {
    		for (int j = 0; j < errMatrix[0].length; j ++) {
    			errResult[i][j] = errMatrix[i][j] / errsum[i];
    		}
    	}
    	return errResult;
    }
    
    //计算预测损失函数值
    public double cost(double [] Label,double [][] errMatrix, double [] errsum,int samNum) {
    	double sum_cost = 0;
    	for(int i = 0; i < samNum; i ++) {
    		int m = (int) Label[i];
    		if ((errMatrix[i][m] / (- errsum[i])) > 0) {
    			sum_cost -= Math.log(errMatrix[i][m] / (- errsum[i]));
    		}
    		else {
    			sum_cost -= 0;
    		}
    	}
    	return sum_cost / samNum;
    }
    
    public double [][] Update(double [][] feature, double[] Label, int maxCycle, double rate,int paraNum,int labelNum, int samNum){
    	//初始化权重矩阵
    	double [][] weights = ParaInitialize(paraNum,labelNum);
    	// 循环迭代优化权重矩阵
    	for(int i = 0; i < maxCycle; i ++) {
    		//假设函数的分子部分
    		double [][] errMatrix = err(weights,feature);
    		//假设函数的分母部分的负数
    		double [] errsum = errSum(errMatrix);
    		if (i % 10 == 0) {
    			double cost = cost(Label,errMatrix,errsum,samNum);
    			System.out.println("第" + i + "次迭代的损失函数值为:" + cost);
    		}
    		//假设函数的负数矩阵
    		double [][] errResult = errFunction(errMatrix,errsum);
    		for (int j = 0; j < samNum; j ++) {
    			int m = (int) Label[j];
    			errResult[j][m] += 1; 
    		}    		
    		// 计算权重矩阵中每个权重参数的梯度方向
    		double [][] delt_weights = new double[paraNum][labelNum];
    		for (int iter1 = 0; iter1 < paraNum; iter1 ++) {
    			for (int iter2 = 0; iter2 < labelNum; iter2 ++) {
    				double tmp = 0;
    				for (int iter3 = 0; iter3 < samNum; iter3 ++) {
    					tmp = tmp + feature[iter3][iter1] * errResult[iter3][iter2];
    				}
    				delt_weights[iter1][iter2] = tmp / samNum;
    			}
    		}
    		
    		for (int iter1 = 0; iter1 < paraNum; iter1 ++) {
    			for (int iter2 = 0; iter2 < labelNum; iter2 ++) {
    				weights[iter1][iter2] = weights[iter1][iter2] + rate * delt_weights[iter1][iter2];
    			}    		} 		
    	}
    	return weights; 
    }    
}

其次,模型测试。

public class SRTest {
	//从矩阵的一行中找到最大元素对应的指针
	public static int MaxSearch(double [] array) {
		int  pointer = 0;
		double tmp = 0;
		for (int j = 0; j < array.length; j ++) {
			if (array[j] > tmp) {
				tmp = array[j];
				pointer = j;
			}
		}
		return pointer;
	}
	//计算预测结果
	public static double [] SRtest(int labelNum,int samNum,int paraNum,double [][] feature,double [][] weights) {
		double [][] pre_results = new double [samNum][labelNum];
		for (int i = 0; i < samNum; i ++) {
			for (int j = 0; j < labelNum; j ++) {
				double tmp = 0;
				for (int n = 0; n < paraNum; n ++) {
					tmp += feature[i][n] * weights[n][j];
				}
				pre_results[i][j] = tmp;
			}
		}
		double [] results = new double [samNum];
		for (int m = 0; m < samNum; m ++) {
			results[m] = MaxSearch(pre_results[m]);
		}
		return results;
	}
}

再其次,保存模型权重矩阵与预测结果。

import java.io.*;

public class SaveModelResults {
	public static void savemodel(String filename, double [][] W) throws IOException{
		File f = new File(filename);
		// 构建FileOutputStream对象
		FileOutputStream fip = new FileOutputStream(f);
		// 构建OutputStreamWriter对象
		OutputStreamWriter writer = new OutputStreamWriter(fip,"UTF-8");
		//计算模型矩阵的元素个数
		int n = W.length;
		int m = W[0].length;
		StringBuffer sb = new StringBuffer();
		for (int j = 0; j < n-1; j ++) {
			for (int i = 0; i < m-1; i ++) {
			sb.append(String.valueOf(W[j][i]));
			sb.append("\t");
			}
			sb.append(String.valueOf(W[j][m-1]));
			sb.append("\n");
		}
		
		for (int i = 0; i < m-1; i ++) {
			sb.append(String.valueOf(W[n-1][i]));
			sb.append("\t");
		}
		
		sb.append(String.valueOf(W[n-1][m-1]));		
		String sb1 = sb.toString();
		writer.write(sb1);
		writer.close();
		fip.close();
	}
	
	public static void saveresults(String filename, double [] results) throws IOException{
		File f = new File(filename);
		// 构建FileOutputStream对象
		FileOutputStream fip = new FileOutputStream(f);
		// 构建OutputStreamWriter对象
		OutputStreamWriter writer = new OutputStreamWriter(fip,"UTF-8");
		//计算的预测结果中元素个数
		int n = results.length;
		StringBuffer sb = new StringBuffer();
		for (int i = 0; i < n; i ++) {
			sb.append(results[i]);
			sb.append("\n");			
		}
		String sb1 = sb.toString();
		writer.write(sb1);
		writer.close();
		fip.close();		
	}
}

主类:

import java.io.IOException;

public class SRMain {
	public static void main(String[] args) throws IOException{
		// filename 
	    String filename = "SoftInput.txt";
	    // 导入样本特征和标签
	    double [][] feature = LoadData.Loadfeature(filename);
	    double [] Label = LoadData.LoadLabel(filename); 
	    int labelNum = LoadData.LabelNum(Label);
	    // 参数设置
	 	int samNum = feature.length;
	 	int paraNum = feature[0].length;
	 	double rate = 0.04;
	 	int maxCycle = 10000;
	 	// SR模型训练
	 	SRtrainGradientDescent SR = new SRtrainGradientDescent(feature,Label,paraNum,rate,samNum,maxCycle,labelNum);
	 	double [][] weights = SR.Update(feature, Label, maxCycle, rate, paraNum, labelNum, samNum);
	 	//保存模型
	 	String model_path = "wrights.txt";
	 	SaveModelResults.savemodel(model_path, weights);
	 	//模型测试
	 	double [] results = SRTest.SRtest(labelNum, samNum, paraNum, feature, weights);
	 	String results_path = "results.txt";
	 	SaveModelResults.saveresults(results_path, results);
	}	
}

参考资料

1、《Python机器学习实战》

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值