Model | Type | Target | Assumption | Linear / Non-Linear | Loss Function | Evaluation Metric | Advantage | Disadvantage |
---|---|---|---|---|---|---|---|---|
Linear Regression | Supervised | Regression | 1. Independent Features; 2. the dependence of Y on X1,X2,… Xp is linear; 3. error terms are uncorrelated since correlated error terms are bad for standard error and confidence interval calculation (e.g. time series problem); 4. error terms are independent from X; 5. E[ε] = 0; 6. error terms have a constant variance | Linear | RSS | R^2; MSE | 1. Simple approach to supervised learning; 2. No need to scale | Strong Assumption |
Logistic Regression | Supervised | Classification | 1. each sample is assigned to one and only one label; | Linear | Negative Log Likelihood | Confusion Matrix, Precision, Recall, etc | 1. Not Sensitive; 2. Good for k=2, Bernoulli Classification Problem | Not stable for “well-separated” data |
Linear Discriminant Analysis | Supervised | Classification | 1. normal (Gaussian) distributions for each class; 2. same covariance matrix Σ in each class | Linear | Confusion Matrix, Precision, Recall, etc | 1.Stable for “well-separate” data; 2. Good for k>2, it provides low-dimensional views of the data; 3. Good for n << p | Sensitive to the observations that far from the decision boundary | |
Naive Bayes | Supervised | Classification | conditional independence model in each class | Linear | Confusion Matrix, Precision, Recall, etc | 1. good for computation (from joint probability to conditional probability); 2. Rather robust to isolated noise samples, since we average large samples; 3. Handles missing value by ignoring them (do not disregard the record/data point, just disregard the missing feature); 4. Rather robust to irrelevant attributes; 5. useful when p is very large | 1. strong assumption; 2. Not robust to redundant attributes (correlated attributes), because they break down the conditional independence assumption | |
KNN | Supervised | Classifier | similar things are always in close proximity | Non-linear | Confusion Matrix, Precision, Recall, etc | 1.no training needed, just measure the distance; 2. Simple to implement; 3. Few tuning parameter: just K and distance; 4. Flexible: classes do not need to be linearly separable | 1. KNN cannot tell us which predictor is more important; 2. Computationally expensive: we need to calculate distance from new observation to all samples; 3. Sensitive to imbalanced dataset: may get poor results for infrequent classes; 4. Sensitive to irrelevant inputs: irrelevant inputs make distances less meaningful for identifying similar neighbors | |
CART | Supervised | both | doesn’t matter | RSS for Regression; Misclassification rate/ Gini index for Classification | 1. Easy interpretation; 2. Display graphically; 3. Trees can easily handle qualitative predictors without the need to create dummy variables | 1. Poor prediction accuracy; 2. Trees can be very non-robust. A small change in the data can cause a large change in the final estimated tree | ||
Bagging | Supervised | both | doesn’t matter | Out-of-bag Error Estimation | 1. Improved ensemble method using bootstrap aggregation 2. Better Prediction Accuracy ; 3. reduce variance, avoid overfitting | It is no longer clear which variables are most important to the procedure, so not easy to interpretation | ||
Random Forest | Supervised | both | doesn’t matter | 1. Improved Bagged Trees by way of a small tweak that de-correlates the trees; 2. This reduces the variance when we average the trees | Not easy to interpretation | |||
Boosting | Supervised | both | doesn’t matter | 1. Boosting is remarkably resistant to overfitting, and it is fast and simple; 2. It improves the performance of many kinds of machine learning algorithms, not only decision tree | 1. Really hard to interpretation; 2. Susceptible to noisy data | |||
SVM | Supervised | classification | doesn’t matter | Hinge Loss / Squared Hinge Loss | Hamming Loss | 1. Good for “well-separate” data; 2. SVMs are popular in high-dimensional classification problems with p>>n; 3. For nonlinear boundaries, kernel SVMs are popular; 4. More stable; not sensitive to outliers, only depends on support vectors | Results are not probabilities | |
K-Means Clustering | Unsupervised | Clustering | doesn’t matter | within-cluster variation | Easy to implement | 1. Need to find K, but it’s hard to get a perfect one; 2. Sensitive to outliers; 3. Not very robust to changes of the data | ||
Hierarchical Clustering | Unsupervised | Clustering | doesn’t matter | 1. Does not need to choose K; 2. | 1. Sometimes yield worse (i.e. less accurate) results than K - means clustering for a given number of clusters; 2. Need to consider which height to cut the model; 3. Sensitive to outliers; 4. Not very robust to changes of the data |
Model Comparison
最新推荐文章于 2024-07-17 19:01:01 发布