pandas
statsmodels
https://www.statsmodels.org/stable/regression.html
除了下面代码用的ols(最小二乘法以外),还有
OLS(endog[, exog, missing, hasconst]) A simple ordinary least squares model.
GLS(endog, exog[, sigma, missing, hasconst]) Generalized least squares model with a general covariance structure.
WLS(endog, exog[, weights, missing, hasconst]) A regression model with diagonal but non-identity covariance structure.
GLSAR(endog[, exog, rho, missing]) A regression model with an AR§ covariance structure.
yule_walker(X[, order, method, df, inv, demean]) Estimate AR§ parameters from a sequence X using Yule-Walker equation.
QuantReg(endog, exog, **kwargs) Quantile Regression
RecursiveLS(endog, exog, **kwargs) Recursive least squares
import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn import preprocessing
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import sys
df = pd.read_csv("t_traindata000")
df.columns = ['mesh500mid', 'season', 'period', 'dayflag', 'pop']
print df.head()
# drop_fisrt(to avoid multicollinearity)
season_dummy = pd.get_dummies(df['season'], prefix='season',drop_first=True)
period_dummy = pd.get_dummies(df['period'], prefix='period',drop_first=True)
cols_to_keep = ['pop', 'mesh500mid', 'dayflag']
data = df[cols_to_keep].join(season_dummy.ix[:, 'season_1':]).join(period_dummy.ix[:, 'period_1':])
data.head()
#data['intercept'] = 1.0
train_cols = data.columns[2:]
#logit = sm.Logit(data['pop'], data[train_cols])
#----------------
#-----------------
data_cur=data[data['mesh500mid']== 533941604] #533944882
X = data_cur