Python Tutorial: Preparation & Basic Regression

5.1 Pre-process a data set using principal component analysis.

# Notice we are using a new data set that needs to be read into the
# environment
iris = pd.read_csv('/Users/iris.csv')
features = iris.drop(["Target"], axis = 1)
from sklearn import preprocessing
features_scaled = preprocessing.scale(features.as_matrix())
from sklearn.decomposition import PCA
pca = PCA(n_components = 4)
pca = pca.fit(features_scaled)
print(np.transpose(pca.components_))

5.2 Split data into training and testing data and export as a .csv file.

from sklearn.model_selection import train_test_split
target = iris["Target"]
# The following code splits the iris data set into 70% train and 30% test
X_train, X_test, Y_train, Y_test = train_test_split(features, target,
test_size = 0.3,
random_state = 29)
train_x = pd.DataFrame(X_train)
train_y = pd.DataFrame(Y_train)
test_x = pd.DataFrame(X_test)
test_y = pd.DataFrame(Y_test)
train = pd.concat([train_x, train_y], axis = 1)
test = pd.concat([test_x, test_y], axis = 1)
train.to_csv('/Users/iris_train_Python.csv', index = False)
test.to_csv('/Users/iris_test_Python.csv', index = False)

5.3 Fit a logistic regression model.

# Notice we are using a new data set that needs to be read into the
# environment
tips = pd.read_csv('/Users/tips.csv')
# The following code is used to determine if the individual left more
# than a 15% tip
tips["fifteen"] = 0.15 * tips["total_bill"]
tips["greater15"] = np.where(tips["tip"] > tips["fifteen"], 1, 0)
import statsmodels.api as sm
# Notice the syntax of greater15 as a function of total_bill
logreg = sm.formula.glm("greater15 ~ total_bill",
family=sm.families.Binomial(),
data=tips).fit()
print(logreg.summary())

#A logistic regression model can be implemented using sklearn, howeverstatsmodels.api provides a helpful summary about the model, so it is preferable for this example.

5.4 Fit a linear regression model.

# Fit a linear regression model of tip by total_bill
from sklearn.linear_model import LinearRegression
# If your data has one feature, you need to reshape the 1D array
linreg = LinearRegression()
linreg.fit(tips["total_bill"].values.reshape(-1,1), tips["tip"])
print(linreg.coef_)
print(linreg.intercept_)

 

转载于:https://www.cnblogs.com/nuswgg95528736/p/8028091.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值