Does Dimensionality curse effect some models more than others?

In general, the curse of dimensionality makes the problem of searching through a space much more difficult, and effects the majority of algorithms that "learn" through partitioning their vector space. The higher the dimensionality of our optimization problem the more data we need to fill the space that we are optimizing over.

Generalized Linear Models

Linear models suffer immensely from the curse of dimensionality. Linear models partition the space in to a single linear plane. Even if we are not looking to directly compute
$$\hat β = (X′X)^{−1}X′y$$
the problem posed is still very sensitive to collinearity, and can be considered "ill conditioned" without some type of regularization. In very high dimensional spaces, there is more than one plane that can be fitted to your data, and without proper type of regularization can cause the model to behave very poorly. Specifically what regularization does is try to force one unique solution to exist. Both L1 and squared L2 regularization try to minimize the weights, and can be interpreted selecting the model with the smallest weights to be the most "correct" model. This can be thought of as a mathematical formulation of Occams Razor.

Decision Trees
Decision trees also suffer from the curse of dimensionality. Decision trees directly partition the sample space at each node. As the sample space increases, the distances between data points increases, which makes it much harder to find a "good" split.

Random Forests
Random Forests use a collection of decision trees to make their predictions. But instead of using all the features of your problem, individual trees only use a subset of the features. This minimizes the space that each tree is optimizing over and can help combat the problem of the curse of dimensionality.

Boosted Tree's
Boosting algorithms such as AdaBoost suffer from the curse of dimensionality and tend to overffit if regularization is not utilized. I won't go in depth, because the post Is AdaBoost less or more prone to overfitting? explains the reason why better than I could.

Neural Networks
Neural networks are weird in the sense that they both are and are not impacted by the curse of dimensionality dependent on the architecture, activations, depth etc. So to reiterate the curse of dimensionality is the problem that a huge amount of points are necessary in high dimensions to cover an input space. One way to interpret deep neural networks is to think of all layers expect the very last layer as doing a complicated projection of a high dimensional manifold into a lower dimensional manifold, where then the last layer classifies on top of. So for example in a convolutional network for classification where the last layer is a softmax layer, we can interpret the architecture as doing a non-linear projection onto a smaller dimension and then doing a multinomial logistic regression (the softmax layer) on that projection. So in a sense the compressed representation of our data allows us to circumvent the curse of dimensionality. Again this is one interpretation, in reality the curse of dimensionality does in fact impact neural networks, but not at the same level as the models outlined above.

SVM
SVM tend to not overffit as much as generalized linear models due to the excessive regularization that occurs. Check out this post SVM, Overfitting, curse of dimensionality for more detail.

K-NN, K-Means

Both K-mean and K-NN are greatly impacted by the curse of dimensionality, since both of them use the L2 squared distance measure. As the amount of dimensions increases the distance between various data-points increases as well. This is why you need a greater amount of points to cover more space in hopes the distance will be more descriptive.

Feel free to ask specifics about the models, since my answers are pretty general. Hope this helps.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值