python 趋势线计算式_Python-计算有错误的趋势线-CSDN博客

So I've got some data stored as two lists, and plotted them using

plot(datasetx, datasety)

Then I set a trendline

trend = polyfit(datasetx, datasety)

trendx = []

trendy = []

for a in range(datasetx[0], (datasetx[-1]+1)):

trendx.append(a)

trendy.append(trend[0]*a**2 + trend[1]*a + trend[2])

plot(trendx, trendy)

But I have a third list of data, which is the error in the original datasety. I'm fine with plotting the errorbars, but what I don't know is using this, how to find the error in the coefficients of the polynomial trendline.

So say my trendline came out to be 5x^2 + 3x + 4 = y, there needs to be some sort of error on the 5, 3 and 4 values.

Is there a tool using NumPy that will calculate this for me?

解决方案

I think you can use the function curve_fit of scipy.optimize (documentation). A basic example of the usage:

import numpy as np

from scipy.optimize import curve_fit

def func(x, a, b, c):

return a*x**2 + b*x + c

x = np.linspace(0,4,50)

y = func(x, 5, 3, 4)

yn = y + 0.2*np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn)

Following the documentation, pcov gives:

The estimated covariance of popt. The diagonals provide the variance

of the parameter estimate.

So in this way you can calculate an error estimate on the coefficients. To have the standard deviation you can take the square root of the variance.

Now you have an error on the coefficients, but it is only based on the deviation between the ydata and the fit. In case you also want to account for an error on the ydata itself, the curve_fit function provides the sigma argument:

sigma : None or N-length sequence

If not None, it represents the standard-deviation of ydata. This

vector, if given, will be used as weights in the least-squares

problem.

A complete example:

import numpy as np

from scipy.optimize import curve_fit

def func(x, a, b, c):

return a*x**2 + b*x + c

x = np.linspace(0,4,20)

y = func(x, 5, 3, 4)

# generate noisy ydata

yn = y + 0.2 * y * np.random.normal(size=len(x))

# generate error on ydata

y_sigma = 0.2 * y * np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)

# plot

import matplotlib.pyplot as plt

fig = plt.figure()

ax = fig.add_subplot(111)

ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o')

ax.plot(x, np.polyval(popt, x), '-')

ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))

ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))

ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))

ax.grid()

plt.show()

Then something else, about using numpy arrays. One of the main advantages of using numpy is that you can avoid for loops because operations on arrays apply elementwise. So the for-loop in your example can also be done as following:

trendx = arange(datasetx[0], (datasetx[-1]+1))

trendy = trend[0]*trendx**2 + trend[1]*trendx + trend[2]

Where I use arange instead of range as it returns a numpy array instead of a list.

In this case you can also use the numpy function polyval:

trendy = polyval(trend, trendx)