机器学习训练营——机器学习爱好者的自由交流空间(入群联系qq:2279055353)
机器学习模型
Logistic回归模型
作为一个基础模型,我们将使用scikit-learn
库的LogisticRegression
, 建立Logistic模型。为此,我们将使用所有的特征,我们也将填补缺失值,归一化特征。
from sklearn.preprocessing import MinMaxScaler, Imputer
# Drop the target from the training data
if 'TARGET' in app_train:
train = app_train.drop(columns = ['TARGET'])
else:
train = app_train.copy()
# Feature names
features = list(train.columns)
# Copy of the testing data
test = app_test.copy()
# Median imputation of missing values
imputer = Imputer(strategy = 'median')
# Scale each feature to 0-1
scaler = MinMaxScaler(feature_range = (0, 1))
# Fit on the training data
imputer.fit(train)
# Transform both training and testing data
train = imputer.transform(train)
test = imputer.transform(app_test)
# Repeat with the scaler
scaler.fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
print('Training data shape: ', train.shape)
print('Testing data shape: ', test.shape)
Training data shape: (307511, 240)
Testing data shape: (48744, 240)
我们只改变一个默认参数,正则参数C
, 它用来控制过度拟合程度,降低它的值将减小过度拟合度。在这里,我们使用常见的scikit-learn建模语法规则:
-
产生模型