python的ols函数_Statsmodels OLS函数与虚拟变量Python

我正在尝试使用分类变量创建回归 .

我从获取所有虚拟变量开始 . 并删除x值中不需要的所有内容

d1 = pd.get_dummies(df2015 ["CBSA Office"])

df2015_new = pd.concat([df2015, d1], axis=1)

d2 = pd.get_dummies(df2016 ["CBSA Office"])

df2016_new = pd.concat([df2016, d2], axis=1)

trainset = pd.concat([df2015_new,df2016_new],axis=0)

trainset = trainset.dropna()

x_train = trainset.drop(['CBSA Office','Location','Updated','Commercial Flow','Travellers Flow'],axis="columns")

y_train = trainset["Travellers Flow"]

现在我正在使用OLS函数运行回归 .

x_train = x_train.iloc[:100].values.reshape(-1,1)

y_train = y_train.iloc[:100].values.reshape(-1,1)

modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()

modelx.summary()

然后我会得到一条错误消息说

endog and exog matrices are different sizes

但我以为我已经设置了相同的尺寸

如果我不重塑它们,我会得到这样的结果

C:\Users\CiCi\Anaconda3-1\lib\site-packages\statsmodels\regression\linear_model.py:1554: RuntimeWarning: invalid value encountered in double_scalars

return self.ess/self.df_model

C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater

return (self.a < x) & (x < self.b)

C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less

return (self.a < x) & (x < self.b)

C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal

cond2 = cond0 & (x <= self.a)

C:\Users\CiCi\Anaconda3-1\lib\site-packages\statsmodels\base\model.py:1100: RuntimeWarning: invalid value encountered in true_divide

return self.params / self.bse

OLS Regression Results

Dep. Variable: Travellers Flow R-squared: 0.000

Model: OLS Adj. R-squared: 0.000

Method: Least Squares F-statistic: nan

Date: Sun, 09 Dec 2018 Prob (F-statistic): nan

Time: 00:34:01 Log-Likelihood: -429.08

No. Observations: 100 AIC: 860.2

Df Residuals: 99 BIC: 862.8

Df Model: 0

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

Abbotsford-Huntingdon 8.5000 1.776 4.786 0.000 4.976 12.024

Aldergrove 0 0 nan nan 0 0

Ambassador Bridge 0 0 nan nan 0 0

Blue Water Bridge 0 0 nan nan 0 0

Boundary Bay 0 0 nan nan 0 0

Cornwall 0 0 nan nan 0 0

Coutts 0 0 nan nan 0 0

Douglas (Peace Arch) 0 0 nan nan 0 0

Edmundston 0 0 nan nan 0 0

Emerson 0 0 nan nan 0 0

Fort Frances Bridge 0 0 nan nan 0 0

North Portal 0 0 nan nan 0 0

Pacific Highway 0 0 nan nan 0 0

Peace Bridge 0 0 nan nan 0 0

Prescott 0 0 nan nan 0 0

Queenston-Lewiston Bridge 0 0 nan nan 0 0

Rainbow Bridge 0 0 nan nan 0 0

Sault Ste. Marie 0 0 nan nan 0 0

St-Armand/Philipsburg 0 0 nan nan 0 0

St-Bernard-de-Lacolle 0 0 nan nan 0 0

St. Stephen 0 0 nan nan 0 0

St. Stephen 3rd Bridge 0 0 nan nan 0 0

Stanstead 0 0 nan nan 0 0

Thousand Islands Bridge 0 0 nan nan 0 0

Windsor and Detroit Tunnel 0 0 nan nan 0 0

Woodstock Road 0 0 nan nan 0 0

Omnibus: 81.245 Durbin-Watson: 0.324

Prob(Omnibus): 0.000 Jarque-Bera (JB): 453.220

Skew: 2.832 Prob(JB): 3.84e-99

Kurtosis: 11.757 Cond. No. 1.00e+16

Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[2] The smallest eigenvalue is 9.98e-31. This might indicate that there are

strong multicollinearity problems or that the design matrix is singular.

这是我想要的格式,其中包括所有虚拟变量,但它有很多警告,R ^ 2为0,并且我肯定不能根据它进行任何预测 .

我想要的是一个摘要包括每个虚拟变量

我试着这样做

x_train = np.array(x_train).reshape(1,-1)

y_train = np.array(y_train).reshape(1,-1)

modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()

modelx.summary()

我会得到

MemoryError Traceback (most recent call last)

in ()

1 x_train = np.array(x_train).reshape(1,-1)

2 y_train = np.array(y_train).reshape(1,-1)

----> 3 modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()

4 modelx.summary()

~\Anaconda3-1\lib\site-packages\statsmodels\regression\linear_model.py in fit(self, method, cov_type, cov_kwds, use_t, **kwargs)

273 self.pinv_wexog, singular_values = pinv_extended(self.wexog)

274 self.normalized_cov_params = np.dot(

--> 275 self.pinv_wexog, np.transpose(self.pinv_wexog))

276

277 # Cache these singular values for use later.

MemoryError:

我是python的新手,需要很多帮助,谢谢!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值