Non Linear Regression Analysis

If the data shows a curvy trend, then linear regression will not produce very accurate results when compared to a non-linear regression because, as the name implies, linear regression presumes that the data is linear. Let's learn about non linear regressions and apply an example on python. In this notebook, we fit a non-linear model to the datapoints corrensponding to China's GDP from 1960 to 2014.

Importing required libraries

 

 
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Though Linear regression is very good to solve many problems, it cannot be used for all datasets. First recall how linear regression, could model a dataset. It models a linear relation between a dependent variable y and independent variable x. It had a simple equation, of degree 1, for example y = 2*(x) + 3.

 

x = np.arange(-5.0,5.0,0.1)
## You can adjust the slope and intercept to verify the changes in the graph
y = 2*(x) + 3
y_noise = 2 * np.random.normal(size=x.size)
ydata = y + y_noise
#plt.figure(figsize=(8,6))
plt.plot(x, ydata, 'bo')
plt.plot(x,y,'r')
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()

 

Non-linear regressions are a relationship between independent variables 𝑥 and a dependent variable 𝑦 which result in a non-linear function modeled data. Essentially any relationship that is not linear can be termed as non-linear, and is usually represented by the polynomial of 𝑘 degrees (maximum power of 𝑥

).

 

 𝑦=𝑎𝑥3+𝑏𝑥2+𝑐𝑥+𝑑 

 

Non-linear functions can have elements like exponentials, logarithms, fractions, and others. For example:

𝑦=log(𝑥)

 

Or even, more complicated such as :

𝑦=log(𝑎𝑥3+𝑏𝑥2+𝑐𝑥+𝑑)

 

Let's take a look at a cubic function's graph.

 

 
x = np.arange(-5.0,5.0,0.1)
## You can adjust the slope and intercept to verify the changes in the graph
y=1*(x**3) + 1*(x**2) + 1*x + 3
y_noise = 20 * np.random.normal(size=x.size)
ydata = y + y_noise
plt.plot(x, ydata, 'bo')
plt.plot(x,y, 'r')
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()

 

As you can see, this function has 𝑥3 and 𝑥2

as independent variables. Also, the graphic of this function is not a straight line over the 2D plane. So this is a non-linear function.

Some other types of non-linear functions are:

Quadratic

 

𝑌=𝑋2

 

 

 
x = np.arange(-5.0, 5.0, 0.1)
## You can adjust the slope and intercept to verify the change in the graph
y = np.power(x,2)
y_noise = 2 * np.random.normal(size=x.size)
ydata = y + y_noise
plt.plot(x, ydata, 'bo')
plt.plot(x,y,'r')
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()

Exponential

An exponential function with base c is defined by

𝑌=𝑎+𝑏𝑐𝑋

where b ≠0, c > 0 , c ≠1, and x is any real number. The base, c, is constant and the exponent, x, is a variable.

 

 
X = np.arange(-5.0, 5.0, 0.1)
## You can adjust the slope and intercept to verify the changes in the graph
Y = np.exp(X)
plt.plot(X,Y)
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()

Logarithmic

The response 𝑦

is a results of applying logarithmic map from input 𝑥's to output variable 𝑦. It is one of the simplest form of log(): i.e.

𝑦=log(𝑥)

 

Please consider that instead of 𝑥

, we can use 𝑋, which can be polynomial representation of the 𝑥's. In general form it would be written as

𝑦=log(𝑋)

 

 

 
X = np.arange(0.1, 5.0, 0.1)
Y = np.log(X)
plt.plot(X,Y)
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()

 

 

Sigmoidal/Logistic

 

𝑌=𝑎+𝑏1+𝑐(𝑋−𝑑)

 

 

 
X = np.arange(-5.0, 5.0, 0.1)
Y = 1 - 4/(1+np.power(3, X-2))
plt.plot(X,Y)
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()

Non-Linear Regression example

 

For an example, we're going to try and fit a non-linear model to the datapoints corrensponding to China's GDP from 1960 to 2014. We download a dataset with two columns, the first, a year between 1960 and 2014, the second, China's corresponding annual gross domestic income in US dollars for that year.

 

 
import numpy as np
import pandas as pd

 

df = pd.read_csv("C:/Users/psmax/18MElephant/Cognitive AI/Machine Learning with Python/china_gdp.csv")
df.head(10)
 YearValue
019605.918412e+10
119614.955705e+10
219624.668518e+10
319635.009730e+10
419645.906225e+10
519656.970915e+10
619667.587943e+10
719677.205703e+10
819686.999350e+10
919697.871882e+10

Plotting the Dataset

This is what the datapoints look like. It kind of looks like an either logistic or exponential function. The growth starts off slow, then from 2005 on forward, the growth is very significant. And finally, it deaccelerates slightly in the 2010s.

 

 
plt.figure(figsize=(8,5))
x_data,y_data = (df["Year"].values,df["Value"].values)
plt.plot(x_data, y_data, 'ro') 
# A format string, e.g. ‘ro’ for red circles. See the Notes section for a full description of the format strings.
plt.ylabel('GDP')
plt.xlabel('Year')
plt.show()

Choosing a model

From an initial look at the plot, we determine that the logistic function could be a good approximation, since it has the property of starting with a slow growth, increasing growth in the middle, and then decreasing again at the end; as illustrated below:

 

 
X = np.arange(-5.0, 5.0, 0.1)
Y = 1.0 / (1.0 + np.exp(-X))
plt.plot(X,Y)
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()

The formula for the logistic function is the following:

 

𝑌̂ =11+𝑒𝛽1(𝑋−𝛽2)

 

𝛽1

: Controls the curve's steepness,

𝛽2

: Slides the curve on the x-axis.

Building The Model

Now, let's build our regression model and initialize its parameters.

 

 
def sigmoid(x,Beta_1,Beta_2):
    y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))
    return y

Lets look at a sample sigmoid line that might fit with the data:

 

 
beta_1 = 0.10
beta_2 = 1990.0
# logistic function
Y_pred = sigmoid(x_data,beta_1,beta_2)
# plot initial predication against datapoint
plt.plot(x_data,Y_pred*15000000000000.)
plt.plot(x_data,y_data,'ro')
[<matplotlib.lines.Line2D at 0x1daf24fd908>]

Our task here is to find the best parameters for our model. Lets first normalize our x and y:

 

 
# Lets normalize our data
xdata = x_data/max(x_data)
ydata = y_data/max(y_data)

How we find the best parameters for our fit line?

we can use curve_fit which uses non-linear least squares to fit our sigmoid function, to data. Optimal values for the parameters so that the sum of the squared residuals of sigmoid(xdata, *popt) - ydata is minimized.

popt are our optimized parameters.

 

from scipy.optimize import curve_fit
popt,pcov = curve_fit(sigmoid,xdata,ydata)
# print the final parameters
print("beta_1 = %f, beta_2 = %f" %(popt[0],popt[1]))
beta_1 = 690.451711, beta_2 = 0.997207

Now we plot our resulting regresssion model.

 

x = np.linspace(1960, 2015, 55)
x = x/max(x)
plt.figure(figsize=(8,5))
y = sigmoid(x,*popt)
plt.plot(xdata, ydata, 'ro', label='data')
plt.plot(x,y, linewidth=3.0, label='fit')
plt.legend(loc='best')
plt.ylabel('GDP')
plt.xlabel('Year')
plt.show()

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值