Model Comparison

ModelTypeTargetAssumptionLinear / Non-LinearLoss FunctionEvaluation MetricAdvantageDisadvantage
Linear RegressionSupervisedRegression1. Independent Features; 2. the dependence of Y on X1,X2,… Xp is linear; 3. error terms are uncorrelated since correlated error terms are bad for standard error and confidence interval calculation (e.g. time series problem); 4. error terms are independent from X; 5. E[ε] = 0; 6. error terms have a constant varianceLinearRSSR^2; MSE1. Simple approach to supervised learning; 2. No need to scaleStrong Assumption
Logistic RegressionSupervisedClassification1. each sample is assigned to one and only one label;LinearNegative Log LikelihoodConfusion Matrix, Precision, Recall, etc1. Not Sensitive; 2. Good for k=2, Bernoulli Classification ProblemNot stable for “well-separated” data
Linear Discriminant AnalysisSupervisedClassification1. normal (Gaussian) distributions for each class; 2. same covariance matrix Σ in each classLinearConfusion Matrix, Precision, Recall, etc1.Stable for “well-separate” data; 2. Good for k>2, it provides low-dimensional views of the data; 3. Good for n << pSensitive to the observations that far from the decision boundary
Naive BayesSupervisedClassificationconditional independence model in each classLinearConfusion Matrix, Precision, Recall, etc1. good for computation (from joint probability to conditional probability); 2. Rather robust to isolated noise samples, since we average large samples; 3. Handles missing value by ignoring them (do not disregard the record/data point, just disregard the missing feature); 4. Rather robust to irrelevant attributes; 5. useful when p is very large1. strong assumption; 2. Not robust to redundant attributes (correlated attributes), because they break down the conditional independence assumption
KNNSupervisedClassifiersimilar things are always in close proximityNon-linearConfusion Matrix, Precision, Recall, etc1.no training needed, just measure the distance; 2. Simple to implement; 3. Few tuning parameter: just K and distance; 4. Flexible: classes do not need to be linearly separable1. KNN cannot tell us which predictor is more important; 2. Computationally expensive: we need to calculate distance from new observation to all samples; 3. Sensitive to imbalanced dataset: may get poor results for infrequent classes; 4. Sensitive to irrelevant inputs: irrelevant inputs make distances less meaningful for identifying similar neighbors
CARTSupervisedbothdoesn’t matterRSS for Regression; Misclassification rate/ Gini index for Classification1. Easy interpretation; 2. Display graphically; 3. Trees can easily handle qualitative predictors without the need to create dummy variables1. Poor prediction accuracy; 2. Trees can be very non-robust. A small change in the data can cause a large change in the final estimated tree
BaggingSupervisedbothdoesn’t matterOut-of-bag Error Estimation1. Improved ensemble method using bootstrap aggregation 2. Better Prediction Accuracy ; 3. reduce variance, avoid overfittingIt is no longer clear which variables are most important to the procedure, so not easy to interpretation
Random ForestSupervisedbothdoesn’t matter1. Improved Bagged Trees by way of a small tweak that de-correlates the trees; 2. This reduces the variance when we average the treesNot easy to interpretation
BoostingSupervisedbothdoesn’t matter1. Boosting is remarkably resistant to overfitting, and it is fast and simple; 2. It improves the performance of many kinds of machine learning algorithms, not only decision tree1. Really hard to interpretation; 2. Susceptible to noisy data
SVMSupervisedclassificationdoesn’t matterHinge Loss / Squared Hinge LossHamming Loss1. Good for “well-separate” data; 2. SVMs are popular in high-dimensional classification problems with p>>n; 3. For nonlinear boundaries, kernel SVMs are popular; 4. More stable; not sensitive to outliers, only depends on support vectorsResults are not probabilities
K-Means ClusteringUnsupervisedClusteringdoesn’t matterwithin-cluster variationEasy to implement1. Need to find K, but it’s hard to get a perfect one; 2. Sensitive to outliers; 3. Not very robust to changes of the data
Hierarchical ClusteringUnsupervisedClusteringdoesn’t matter1. Does not need to choose K; 2.1. Sometimes yield worse (i.e. less accurate) results than K - means clustering for a given number of clusters; 2. Need to consider which height to cut the model; 3. Sensitive to outliers; 4. Not very robust to changes of the data
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值