python回归及岭回归
I frequently predict proportions (e.g., proportion of year during which a customer is active). This is a regression task because the dependent variables is a float, but the dependent variable is bound between the 0 and 1. Googling around, I had a hard time finding the a good way to model this situation, so I’ve written here what I think is the most straight forward solution.
我经常预测比例(例如,客户活跃的年份比例)。 这是一项回归任务,因为因变量是一个浮点数,但因变量的范围是0到1。在谷歌搜索中,我很难找到一种模拟这种情况的好方法,因此我在这里写了什么我认为这是最直接的解决方案。
I am guessing there’s a better way to do this with MCMC, so please comment below if you know a better way.
我猜想MCMC会有更好的方法,因此,如果您知道更好的方法,请在下面评论。
Let’s get started by importing some libraries for making random data.
首先,导入一些用于生成随机数据的库。
|
Create random regression data.
创建随机回归数据。
|
Shrink down the dependent variable so it’s bound between 0 and 1.
缩小因变量,使其在0到1之间。
|
Make a quick plot to confirm that the data is bound between 0 and 1.
快速绘图以确认数据在0到1之间。
|
All the data here is fake which worries me, but beggars can’t be choosers and this is just a quick example.
这里的所有数据都是伪造的,令我担心的是,但乞be不能成为选择者,这只是一个简单的例子。
Below, I apply a plain GLM to the data. This is what you would expect if you treated this as a plain regression problem
下面,我对数据应用普通的GLM。 如果将其视为简单的回归问题,这就是您所期望的
|
Here’s the actual values plotted (x-axis) against the predicted values (y-axis). The model does a decent job, but check out the values on the y-axis – the linear model predicts negative values!
这是相对于预测值(y轴)绘制的实际值(x轴)。 该模型做得不错,但是请检查y轴上的值-线性模型会预测负值!
|
Obviously the linear model above isn’t correctly modeling this data since it’s guessing values that are impossible.
显然,上面的线性模型不能正确地对此数据建模,因为它猜测的是不可能的值。
I followed this tutorial which recommends using a GLM with a logit link and the binomial family. Checking out the statsmodels module reference, we can see the default link for the binomial family is logit.
我遵循了本教程 , 该教程建议使用带有Logit链接和二项式族的GLM。 查看statsmodels模块参考 ,我们可以看到二项式族的默认链接是logit。
Below I apply a GLM with a logit link and the binomial family to the data.
在下面,我将具有Logit链接和二项式族的GLM应用于数据。
|
Here’s the actual data (x-axis) plotted against teh predicted data. You can see the fit is much better!
这是相对于预测数据绘制的实际数据(x轴)。 您可以看到适合度更高!
|
|
CPython 3.6.3 IPython 6.1.0 numpy 1.13.3 matplotlib 2.0.2 sklearn 0.19.1 seaborn 0.8.0 statsmodels 0.8.0 compiler : GCC 7.2.0 system : Linux release : 4.13.0-38-generic machine : x86_64 processor : x86_64 CPU cores : 4 interpreter: 64bit
CPython 3.6.3 IPython 6.1.0 numpy 1.13.3 matplotlib 2.0.2 sklearn 0.19.1 seaborn 0.8.0 statsmodels 0.8.0 compiler : GCC 7.2.0 system : Linux release : 4.13.0-38-generic machine : x86_64 processor : x86_64 CPU cores : 4 interpreter: 64bit
翻译自: https://www.pybloggers.com/2018/05/regression-of-a-proportion-in-python/
python回归及岭回归