/Users/shenxin/anaconda3/lib/python3.6/site-packages/seaborn/axisgrid.py:703: UserWarning: Using the pointplot function without specifying `order` is likely to produce an incorrect plot.
warnings.warn(warning)
/Users/shenxin/anaconda3/lib/python3.6/site-packages/seaborn/axisgrid.py:708: UserWarning: Using the pointplot function without specifying `hue_order` is likely to produce an incorrect plot.
warnings.warn(warning)
<seaborn.axisgrid.FacetGrid at 0x114cc0f28>
/Users/shenxin/anaconda3/lib/python3.6/site-packages/seaborn/axisgrid.py:703: UserWarning: Using the barplot function without specifying `order` is likely to produce an incorrect plot.
warnings.warn(warning)
<seaborn.axisgrid.FacetGrid at 0x114b0f7b8>
for dataset in combine:
dataset['IsAlone']=0
dataset.loc[dataset['FamilySize']==1,'IsAlone']=1
train_df[['IsAlone','Survived']].groupby(['IsAlone'], as_index=False).mean()
# machine learningfrom sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
# Logistic Regressionfrom sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
logreg = LogisticRegression()
logreg.fit(X_train, Y_train)
y_pred = logreg.predict(X_train)
acc_log =round(logreg.score(X_train, Y_train)*100,2)print(acc_log)print('把所有数据当作是训练集')print('逻辑回归的准确率为:{}'.format(logistic.score(X_train, Y_train)))
y_pred = logistic.predict(X_train)print('逻辑回归的精确率为:{}'.format(precision_score(Y_train, y_pred)))print('逻辑回归的召回率为:{}'.format(recall_score(Y_train, y_pred)))print('逻辑回归的F1-score为:{}'.format(f1_score(Y_train, y_pred)))
fpr, tpr, _ = roc_curve(Y_train, logistic.predict_proba(X_train)[:,1])
roc_auc = auc(fpr, tpr)# Plot of a ROC curve for a specific class
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)'% roc_auc)
plt.plot([0,1],[0,1],'k--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()
xgboost的结果是: [0.77441077 0.81144781 0.8047138 ]
如果把所有数据当作是训练集,并对其进行预测的结果如下所示:
xgboost的准确率为:0.8395061728395061
xgboost的精确率为:0.8419243986254296
xgboost的召回率为:0.716374269005848
xgboost的F1-score为:0.7740916271721958
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
logistic = LogisticRegression()
logistic.fit(train_linear_model, train_label)print('如果把所有数据当作是训练集,并对其进行预测的结果如下所示:')print('逻辑回归的准确率为:{}'.format(logistic.score(train_linear_model, train_label)))
y_pred = logistic.predict(train_linear_model)print('逻辑回归的精确率为:{}'.format(precision_score(train_label, y_pred)))print('逻辑回归的召回率为:{}'.format(recall_score(train_label, y_pred)))print('逻辑回归的F1-score为:{}'.format(f1_score(train_label, y_pred)))
fpr, tpr, _ = roc_curve(train_label, logistic.predict_proba(train_linear_model)[:,1])
roc_auc = auc(fpr, tpr)# Plot of a ROC curve for a specific class
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)'% roc_auc)
plt.plot([0,1],[0,1],'k--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()print('如果进行训练集和验证集的划分,并对验证集进行预测的结果如下所示:')
X_train, X_test, y_train, y_test = train_test_split(train_linear_model, train_label, test_size=.25)
logistic = LogisticRegression()
logistic.fit(X_train, y_train)
y_pred = logistic.predict(X_test)print('逻辑回归的准确率为:{}'.format(accuracy_score(y_test, y_pred)))print('逻辑回归的精确率为:{}'.format(precision_score(y_test, y_pred)))print('逻辑回归的召回率为:{}'.format(recall_score(y_test, y_pred)))print('逻辑回归的F1-score为:{}'.format(f1_score(y_test, y_pred)))# Determine the false positive and true positive rates
fpr, tpr, _ = roc_curve(y_test, logistic.predict_proba(X_test)[:,1])# Calculate the AUC
roc_auc = auc(fpr, tpr)# Plot of a ROC curve for a specific class
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)'% roc_auc)
plt.plot([0,1],[0,1],'k--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()