E4523 Machine Learning

目录

1. Regression [S]

1.1 Linear Regression (Classification to Binomial Categories)

1.2 Evaluate - Confusion Matrices

2. Classification & Regression Trees (CART) [S]

2.1 Decision Trees

Stopping and Pruning Rules

Regression Tree

2.2 Random Forests

 2.3 Bootstrapping

3. Neural Network [S]

4. Rock & Mine Example- Classification

4.1 Encoder

4.2 MLPClassifier

4.3 Logistic Regression

4.4 Random Forest

4.5 NN

5. Wine Quality Example - Regression

5.1 MLPRegressor

5.2 RandomForestRegressor


​​​​​​​

​​​​​​​​​​​​​​

1. Regression [S]

1.1 Linear Regression (Classification to Binomial Categories)

1. Read data

import pandas as pd
from pandas import DataFrame
url="https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data"
df = pd.read_csv(url,header=None)
df.describe()

2. Train & Fit & Predict

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import DataFrame
%matplotlib inline
# 把 R/M 转化为 0/1
df[60]=np.where(df[60]=='R',0,1)
# 设置训练集&测试集
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size = 0.3) #df任意30%为test,70%为train
x_train = train.iloc[0:,0:60]
y_train = train[60] #60列,即R/M列
x_test = test.iloc[0:,0:60]
y_test = test[60]
# 建模&拟合
from sklearn import linear_model
model = linear_model.LinearRegression()
model.fit(x_train,y_train)
# 预测
testing_predictions  = model.predict(x_test)
# 设置threshold
def get_classification(predictions,threshold):
    classes = np.zeros_like(testing_predictions)
    for i in range(len(classes)):
        if predictions[i] > threshold:
            classes[i] = 1
    return classes
get_classification(testing_predictions,0.5)

1.2 Evaluate - Confusion Matrices

from sklearn.metrics import confusion_matrix
#真实test值,预测test值,返回matrix[tn,fp;fn,tp]
confusion_matrix(y_test,get_classification(testing_predictions,0.5)) 
#返回各值
tn, fp, fn, tp = confusion_matrix(y_test,get_classification(testing_predictions,0.5)).ravel()
  • True Positive Rate/ Sensitivity/ Recall

预测positive的表现 (预测对+占 真的+比例);如果为1,找到了所有的+;找到+的表现(find)

tpr = tp/(tp+fn)
  • Precision

真的+占被预测成+的比例;如果是1,我们预测是+的,一定是+的;识别+的表现(disciminate)

precision = tp/(tp+fp)
  • F-Score

综合precision和recall的表现

f = precision*tpr/(precision+tpr)*2
  • True Negative Rate/ Specificity

预测negative的表现

tnr = tn/(tn+fp)
  • False Positive Rate/ Fall out

和True Nagetive Rate相加为1

fpr = fp/(fp+tn)
  • Accuracy

判断+/-的准确率,如果为1,表示所有归类都正确

accuracy = (tp+tn)/(tp+tn+fp+fn)
  • Misclassification Rate

和Accuracy相加为1

misclassification_rate = (fp + fn)/(tp+fp+tn+fn)
  • ROC Curve
testing_predictions = model.predict(x_test)
(fpr, tpr, thresholds) = roc_curve(y_test,testing_predictions)
area = auc(fpr,tpr)
plt.clf() #Clear the current figure
plt.plot(fpr,tpr,label="Out-Sample ROC Curve with area = %1.2f"%area)

plt.plot([0, 1], [0, 1], 'k')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Out sample ROC rocks versus mines')
plt.legend(loc="lower right")
plt.show()

  •  Precision vs. Recall
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt
from sklearn.metrics import average_precision_score

precision, recall, thresholds = precision_recall_curve(y_test, testing_predictions)
average_precision = average_precision_score(y_test, testing_predictions)

step_kwargs = ({'step' : 'post'})

plt.step(recall, precision, color='b', alpha=0.2,
         where='post')
plt.fill_between(recall, precision, alpha=0.2, color='b', **step_kwargs)

plt.xlabel('Recall')
plt.ylabel('
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值