Statistics与Machine Learning有什么区别

转统计大牛沃塞曼的一篇博文。

 

 

Statistics Versus Machine Learning

——Larry Wasserman, posted on June 12, 2012 at 7:46 pm

 

Welcome to my blog, which will discuss topics in Statistics and Machine Learning. Some posts will be technical  and others will be non-technical. Since this blog is about topics in both Statistics and Machine Learning, perhaps I should address the question: What is the difference between these two fields?

The short answer is: None. They are both concerned with the same question: how do we learn from data?

But a more nuanced view reveals that there are differences due to historical and sociological reasons. Statistics is an older field than Machine Learning (but young compared to Math, Physics etc). Thus, ideas about collecting and analyzing data in Statistics are rooted in the times before computers even existed. Of course, the field has adapted as times have changed but history matters and the result is that the way Statisticians think, teach, approach problems and choose research topics is often different than their colleagues in Machine Learning. I am fortunate to be at an institution (Carnegie Mellon) which is active in both (and I have appointments in both departments) so I get to see the similarities and differences.

If I had to summarize the main difference between the two fields I would say:

Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems.

Machine Learning emphasizes high dimensional prediction problems.

But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example:

Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.

Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting.

But the differences become blurrier all the time. Check out two flagship journals:

The Annals of Statistics and The Journal of Machine Learning Research.

The overlap in topics is striking. And many topics get started in one field and then are developed further in the other. For example, Reproducing Kernel Hilbert Space (RKHS) methods are hot in Machine Learning but they began in Statistics (thanks to Manny Parzen and Grace Wahba). Similarly, much of online learning has its roots in the work of the statisticians David Blackwell and Jim Hannan. And of course there are topics that are highly active in both areas such as concentration of measure, sparsity and convex optimization. There are also differences in terminology. Here are some examples:

Statistics       Machine Learning

———————————–————–

Estimation     Learning

Classifier       Hypothesis

Data point     Example/Instance

Regression    Supervised Learning

Classification  Supervised Learning

Covariate      Feature

Response      Label

 

and of course:

Statisticians use R.

Machine Learners use Matlab.

 

Overall, the the two fields are blending together more and more and I think this is a good thing.

 

另外两篇:

from: http://normaldeviate.wordpress.com/2012/06/12/statistics-versus-machine-learning-5-2/

转载于:https://www.cnblogs.com/nn0p/archive/2012/11/14/2770668.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Pratap Dangeti, "Statistics for Machine Learning" English | ISBN: 1788295757 | 2017 | EPUB | 311 pages | 12 MB Key Features Learn about the statistics behind powerful predictive models with p-value, ANOVA, F-statistics. Implement statistical computations programmatically for supervised and unsupervised learning through K-means clustering. Master the statistical aspect of machine learning with the help of this example-rich guide in R & Python. Book Description Complex statistics in machine learning worries a lot of developers. Knowing statistics helps in building strong machine learning models that are optimized for a given problem statement. This book will teach you all it takes to perform complex statistical computations required for machine learning. You will gain information on statistics behind supervised learning, unsupervised learning, reinforcement learning, and more. You will see real-world examples that discuss the statistical side of machine learning and make you comfortable with it. You will come across programs for performing tasks such as model, parameters fitting, regression, classification, density collection, working with vectors, matrices, and more.By the end of the book, you will understand concepts of required statistics for Machine Learning and will be able to apply your new skills to any sort of industry problems. What you will learn Understanding Statistical & Machine learning fundamentals necessary to build models Understanding major differences & parallels between statistics way of solving problem & machine learning way of solving problem Know how to prepare data and "feed" the models by using the appropriate machine learning algorithms from the adequate R & Python packages Analyze the results and tune the model appropriately to his or her own predictive goals Understand concepts of required statistics for Machine Learning Draw parallels between statistics and machine learning Understand each component of machine learning models and see impact of changing them
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值