python和matlab拟合_使用Scipy与Matlab拟合对数正态分布

I am trying to fit a lognormal distribution using Scipy. I've already done it using Matlab before but because of the need to extend the application beyond statistical analysis, I am in the process of trying to reproduce the fitted values in Scipy.

Below is the Matlab code I used to fit my data:

% Read input data (one value per line)

x = [];

fid = fopen(file_path, 'r'); % reading is default action for fopen

disp('Reading network degree data...');

if fid == -1

disp('[ERROR] Unable to open data file.')

else

while ~feof(fid)

[x] = [x fscanf(fid, '%f', [1])];

end

c = fclose(fid);

if c == 0

disp('File closed successfully.');

else

disp('[ERROR] There was a problem with closing the file.');

end

end

[f,xx] = ecdf(x);

y = 1-f;

parmhat = lognfit(x); % MLE estimate

mu = parmhat(1);

sigma = parmhat(2);

And here's the fitted plot:

Now here's my Python code with the aim of achieving the same:

import math

from scipy import stats

from statsmodels.distributions.empirical_distribution import ECDF

# The same input is read as a list in Python

ecdf_func = ECDF(degrees)

x = ecdf_func.x

ccdf = 1-ecdf_func.y

# Fit data

shape, loc, scale = stats.lognorm.fit(degrees, floc=0)

# Parameters

sigma = shape # standard deviation

mu = math.log(scale) # meanlog of the distribution

fit_ccdf = stats.lognorm.sf(x, [sigma], floc=1, scale=scale)

Here's the fit using the Python code.

As you can see, both sets of code are capable of producing good fits, at least visually speaking.

Problem is that there is a huge difference in the estimated parameters mu and sigma.

From Matlab: mu = 1.62 sigma = 1.29.

From Python: mu = 2.78 sigma = 1.74.

Why is there such a difference?

Note: I have double checked that both sets of data fitted are exactly the same. Same number of points, same distribution.

Your help is much appreciated! Thanks in advance.

Other info:

import scipy

import numpy

import statsmodels

scipy.__version__

'0.9.0'

numpy.__version__

'1.6.1'

statsmodels.__version__

'0.5.0.dev-1bbd4ca'

Version of Matlab is R2011b.

Edition:

As demonstrated in the answer below, the fault lies with Scipy 0.9. I am able to reproduce the mu and sigma results from Matlab using Scipy 11.0.

An easy way to update your Scipy is:

pip install --upgrade Scipy

If you don't have pip (you should!):

sudo apt-get install pip

解决方案

There is a bug in the fit method in scipy 0.9.0 that has been fixed in later versions of scipy.

The output of the script below should be:

Explicit formula: mu = 4.99203450, sig = 0.81691086

Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086

Fit x to lognorm: mu = 4.99203468, sig = 0.81691081

but with scipy 0.9.0, it is

Explicit formula: mu = 4.99203450, sig = 0.81691086

Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086

Fit x to lognorm: mu = 4.23197270, sig = 1.11581240

The following test script shows three ways to get the same results:

import numpy as np

from scipy import stats

def lognfit(x, ddof=0):

x = np.asarray(x)

logx = np.log(x)

mu = logx.mean()

sig = logx.std(ddof=ddof)

return mu, sig

# A simple data set for easy reproducibility

x = np.array([50., 50, 100, 200, 200, 300, 500])

# Explicit formula

my_mu, my_sig = lognfit(x)

# Fit a normal distribution to log(x)

norm_mu, norm_sig = stats.norm.fit(np.log(x))

# Fit the lognormal distribution

lognorm_sig, _, lognorm_expmu = stats.lognorm.fit(x, floc=0)

print "Explicit formula: mu = %10.8f, sig = %10.8f" % (my_mu, my_sig)

print "Fit log(x) to norm: mu = %10.8f, sig = %10.8f" % (norm_mu, norm_sig)

print "Fit x to lognorm: mu = %10.8f, sig = %10.8f" % (np.log(lognorm_expmu), lognorm_sig)

With the option ddof=1 in the std. dev. calculation to use the unbiased variance estimation:

In [104]: x

Out[104]: array([ 50., 50., 100., 200., 200., 300., 500.])

In [105]: lognfit(x, ddof=1)

Out[105]: (4.9920345004312647, 0.88236457185021866)

There is a note in matlab's lognfit documentation that says when censoring is not used, lognfit computes sigma using the square root of the unbiased estimator of the variance. This corresponds to using ddof=1 in the above code.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值