再次实现Logistic Regression(c++)_接口

上面的那个LR实现中,作了如下限制:输入特征都是0-1特征。在实际问题中,对于以枚举类型为主的特征,加入这种限制后,特征转化成0-1特征是非常方便的;但如果输入特征绝大部分是实值特征,则需要将这些特征映射到固定区间、然后转成枚举特征、最后映射为0-1特征,这个过程存在信息损失的。现在这个LR特征,则改为输入特征完全为实值特征,在此基础上进行0-1分类。


具体接口如下:

#pragma once

#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <algorithm>
#include <cmath>

using namespace std;

// The represetation for a feature and its value
class FeaValNode
{
public:
	int iFeatureId;
	double dValue;

	FeaValNode (void);
	~FeaValNode (void);
};

// The represetation for a sample
class Sample
{
public:
	// the class index for a sample: 0-1 value, init with '-1'
	int iClass;
	vector<FeaValNode> FeaValNodeVec;

	Sample (void);
	~Sample (void);
};

// The logistic regression 
class LogisticRegression
{
public:
	LogisticRegression(void);
	~LogisticRegression(void);

	// train by SGD on the sample file
	bool TrainSGDOnSampleFile (
				const char * sFileName, int iMaxFeatureNum,		// about the samples
				double dLearningRate,							// about the learning 
				int iMaxLoop, double dMinImproveRatio			// about the stop criteria
				);
	// save the model to txt file: the theta vector with its size
	bool SaveLRModelTxt (const char * sFileName);
	// load the model from txt file: the theta vector with its size
	bool LoadLRModelTxt (const char * sFileName);
	// load the samples from file, predict by the LR model
	bool PredictOnSampleFile (const char * sFileIn, const char * sFileOut, const char * sFileLog);

	// just for test
	void Test (void);

private:
	// the theta vector for each dimension of feature
	vector<double> ThetaVec;

	// read a sample from a line, return false if fail
	bool ReadSampleFrmLine (string & sLine, Sample & theSample);
	// the Sigmoid function: f(x) = 1 / (1 + exp(-x)) = exp (x) / ( 1 + exp(x) )
	double Sigmoid (double x);
	// calculate the model function output by feature vector
	double CalcFuncOutByFeaVec (vector<FeaValNode> & FeaValNodeVec);
	// calculate the gradient and update the theta value, it requires that sorts the feature index ascendingly
	void UpdateThetaVec (Sample & theSample, double dY, double dLearningRate);
	// predict the class for one single sample
	int PredictOneSample (Sample & theSample);
};


class LogisticRegression的接口基本没变。表示特征的类class Sample稍作改变,除了存储特征索引,还存储了对应的特征值。


转载请注明出处:http://blog.csdn.net/xceman1997/article/details/18136117

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值