logistic regression的一些问题,不平衡数据,时间序列,求解惑

Logistic Regression

1、在有时间序列的特征数据中,怎么运用LR?

不光是LR,其他的模型也是。

有很多基本的模型变形之后,变成带时序的模型。但,个人觉得,这类模型大多不靠谱。

我觉得还是要从业务出发,同时探测分析数据,得出比较合理的假设,然后提取特征,这些特征可以含有时间信息,但不一定是时序的。比如,前N天其他特征的统计组合等。

 

可以参考:Logistic regression for time series

Q:  I would like to use a binary logistic regression model in the context of streaming data (multidimensional time series) in order to predict the value of the dependent variable of the data (i.e. row) that just arrived, given the past observations. As far as I know, logistic regression is traditionally used for postmortem analysis, where each dependent variable has already been set (either by inspection, or by the nature of the study).

A:  There are two methods to consider:

  • Only use the last N input samples. Assuming your input signal is of dimension D, then you have N*D samples per ground truth label. This way you can train using any classifier you like, including logistic regression. This way, each output is considered independent from all other outputs.

  • Use the last N input samples and the last N outputs you have generated. The problem is then similar to viterbi decoding. You could generate a non-binary score based on the input samples, and combine the score of multiple samples using a viterbi decoder. This is better than method 1. if you now something about the temporal relation between the outputs.

 

2、数据不平衡时怎么处理?

比如正负比例1:100,而要研究的是正例的1,这时候LR表现非常差。

一般有两种方案:

1)调整权重,比如正例*10。ps,个人实验还是不理想

2)sample,还没尝试

 

参考:http://www.alidata.org/archives/205 正反例极不平衡的数据集的采样

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值