介绍了 6 种分类算法, 分别是
Linear discriminant analysis (LDA),
Quadratic discriminant analysis (QDA),
Logistic regression (LR),
Support vector machines (SVM),
K-nearest neighbour (KNN).
Linear discriminant analysis (LDA)
Description of the method:
The LDA algorithm starts by finding directions that maximize the separation between classes, then use these directions to predict the class of individuals. These directions, called linear discriminants, are a linear combinations of predictor variables.
LDA assumes that predictors are normally distributed (Gaussian distribution) and that the different classes have class-specific means and equal variance/covariance.
Analysis and results:
Use function “lda()” in “MASS” to build the model based on trainSet, make prediction on testSet. The prediction provides “class”, which is the predicted classes of observation, use it to compute the confusion matrix.
We can find:
- This model gives an accuracy rate 0.71 on testSet, which is barely good;
- Sensitivity is 0.27 and Specificity is 0.89, Sensitivity is low;
- Confusion matrix, of the 59 actual Group0 points, the system predicted that 43 were Group1, most of the points were misallocated. This is another way of showing Sensitivity (1-4359=0.27 ). Of the 141 Group1 points, the system predicted that 15 were Group0, only a small part of points were misallocated. This is another way of showing Specificity (1-15141=0.89 ). Again we can say Specificity is good but Sensitivity is too low.
> model1 <- lda(Group ~ X1+X2, data = trainSet)
> prediction1 <- model1 %>% predict(testSet)
> confusionMatrix(as.factor(prediction1$class),as.factor(testSet$Group))
Confusion Matrix and Statistics
Prediction 0 1
0 16 15
1 43 126
Accuracy : 0.71
95% CI : (0.6418, 0.7718)
No Information Rate : 0.705