boxcox变换python实现

boxcox1p变换参数lambda估算方法:

极大似然估计 或者 贝叶斯估计 (原理略)

  • 极大似然估计:
    设总体中含有待估参数theta, 可以取很多值。已知样本观察值,求使该样本值出现概率最大的theta值作为theta的估计值,称之为极大似然估计。
    参考:极大似然估计思想的最简单解释

    极大似然估计就是在只有概率的情况下,忽略低概率事件直接将高概率事件认为是真实事件的思想。

  • python代码:

for i,lam in enumerate(lam_range):
    llf[i] = stats.boxcox_llf(lam, y)
    
# find the max lgo-likelihood(llf) index and decide the lambda
lam_best = lam_range[llf.argmax()]				# mle_

boxcox1p变换公式:

在这里插入图片描述

  • note: boxcox1p变换中y+c的+c是为了确保(y+c)>0,因为在boxcox变换中要求y>0
  • python代码:
  • y_boxcox = special.boxcox1p(y, lam_best) 利用llf获得优化后的lambda
    或者:
  • boxcox_normmax(x) 得到优化后的lambda
    for i in highskew.index:
        # boxcox1p for x with high skew
        x[i] = boxcox1p(x[i], boxcox_normmax(x[i]))
    

详细语法:

scipy.stats.boxcox_normmax(x, brack=(-2.0, 2.0), method='pearsonr')[source]
Compute optimal Box-Cox transform parameter for input data.

Parameters:	
x : array_like 	Input array.
brack : 2-tuple, optional
	The starting interval for a downhill bracket search with optimize.brent. Note that this is in most cases not critical; the final result is allowed to be outside this bracket.
method : str, optional
	The method to determine the optimal transform parameter (boxcox lmbda parameter). Options are:
		‘pearsonr’ (default)
		Maximizes the Pearson correlation coefficient between y = boxcox(x) and the expected values for y if x would be normally-distributed.
		‘mle’
		Minimizes the log-likelihood boxcox_llf. This is the method used in boxcox. ()
		‘all’
		Use all optimization methods available, and return all results. Useful to compare different methods.
		Returns:	
		maxlog : float or ndarray
		The optimal transform parameter found. An array instead of a scalar for method='all'.

example:

# Generate some data and determine optimal lmbda in various ways:
>>> x = stats.loggamma.rvs(5, size=30) + 5
>>> y, lmax_mle = stats.boxcox(x)
>>> lmax_pearsonr = stats.boxcox_normmax(x)

————————————————————————分割线---------------------------------------------------

# -*- coding: utf-8 -*-
"""
Here the boxcox method will be demonstated including boxcox convert,
lambda estimate via llf, inverse boxcox convert.
"""

import pandas as pd
import numpy as np
from scipy import stats,special
import matplotlib.pyplot as plt

data = pd.read_csv('y_boxcox.csv',header=None)
y = data.iloc[:,1]
print(y.shape)

lam_range = np.linspace(-2,5,100)  # default nums=50
llf = np.zeros(lam_range.shape, dtype=float)

# lambda estimate:
for i,lam in enumerate(lam_range):
    llf[i] = stats.boxcox_llf(lam, y)		# y 必须>0

# find the max lgo-likelihood(llf) index and decide the lambda
lam_best = lam_range[llf.argmax()]
print('Suitable lam is: ',round(lam_best,2))
print('Max llf is: ', round(llf.max(),2))

plt.figure()
plt.plot(lam_range,llf)
plt.show()
plt.savefig('boxcox.jpg')

# boxcox convert:
print('before convert: ','\n', y.head())
#y_boxcox = stats.boxcox(y, lam_best)
y_boxcox = special.boxcox1p(y, lam_best)
print('after convert: ','\n',  pd.DataFrame(y_boxcox.reshape(-1,1)).head())

# inverse boxcox convert:
y_invboxcox = special.inv_boxcox1p(y_boxcox, lam_best)
print('after inverse: ', '\n', pd.DataFrame(y_invboxcox.reshape(-1,1)).head())

'''
output:
(1456,)
Suitable lam is:  -0.02
Max llf is:  -16154.7
before convert:  
 0   208500.00000
1   181500.00000
2   223500.00000
3   140000.00000
4   250000.00000
Name: 1, dtype: float64
after convert:  
          0
0 10.85009
1 10.74166
2 10.90430
3 10.53785
4 10.99156
after inverse:  
              0
0 208500.00000
1 181500.00000
2 223500.00000
3 140000.00000
4 250000.00000
'''
 
  • 7
    点赞
  • 42
    收藏
    觉得还不错? 一键收藏
  • 5
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值