机器学习工程师需要知道的10个算法

By James Le, New Story Charity.
It is no doubt that the sub-field of machine learning / artificial intelligence has increasingly gained more popularity in the past couple of years. As Big Data is the hottest trend in the tech industry at the moment, machine learning is incredibly powerful to make predictions or calculated suggestions based on large amounts of data. Some of the most common examples of machine learning are Netflix’s algorithms to make movie suggestions based on movies you have watched in the past or Amazon’s algorithms that recommend books based on books you have bought before.

毫无疑问,在过去几年里机器学习/人工智能的子领域越来越受欢迎。由于大数据是目前科技行业最热门的趋势,基于大量数据机器学习进行预测或计算建议的能力非常强大。机器学习的一些最常见的例子是根据过去观看的电影制作电影建议的Netflix算法,或基于您之前购买的书籍推荐书籍的亚马逊算法。

So if you want to learn more about machine learning, how do you start? For me, my first introduction is when I took an Artificial Intelligence class when I was studying abroad in Copenhagen. My lecturer is a full-time Applied Math and CS professor at the Technical University of Denmark, in which his research areas are logic and artificial, focusing primarily on the use of logic to model human-like planning, reasoning and problem solving. The class was a mix of discussion of theory/core concepts and hands-on problem solving. The textbook that we used is one of the AI classics: Peter Norvig’s Artificial Intelligence — A Modern Approach,in which we covered major topics including intelligent agents, problem-solving by searching, adversarial search, probability theory, multi-agent systems, social AI, philosophy/ethics/future of AI. At the end of the class, in a team of 3, we implemented simple search-based agents solving transportation tasks in a virtual environment as a programming project.

所以如果你想了解更多关于机器学习的知识,如何开始学习?对我来说,我第一次接触的是在哥本哈根出国留学时参加的人工智能课。我的讲师是丹麦技术大学的全职应用数学和CS教授,他的研究领域是逻辑和人工智能,主要侧重于使用逻辑模拟人类行为,推理和问题解决。这堂课主要讨论的是理论/核心概念和动手解决问题。我们使用的教科书是AI经典之一:  Peter Norvig的“ 人工智能 - 现代方法”其中涵盖智能Agent,通过搜索进行问题求解,对抗搜索,概率论,多代理系统,社交AI,AI的哲学/道德/未来等主要议题。在课程结束时,我们组成了一个3人小组,我们实现了简单的基于搜索的代理,在虚拟环境中作为编程项目来解决传输任务。

I have learned a tremendous amount of knowledge thanks to that class, and decided to keep learning about this specialized topic. In the last few weeks, I have been multiple tech talks in San Francisco on deep learning, neural networks, data architecture — and a Machine Learning conference with a lot of well-known professionals in the field. Most importantly, I enrolled in Udacity’s Intro to Machine Learningonline course in the beginning of June and has just finished it a few days ago. In this post, I want to share some of the most common machine learning algorithms that I learned from the course.

我已经从这门课吸取了大量的知识,并决定继续学习这个专门的课题。在过去的几个星期里,我在旧金山进行了深度学习,神经网络,数据架构方面的多次技术讲座,还有一个机器学习会议,有很多这个领域的知名专家。最重要的是,我在六月初参加了Udacity的机器学习在线课程,并在几天前刚刚完成。在这篇文章中,我想分享一些我从课程中学到的最常见的机器学习算法。

Machine learning algorithms can be divided into 3 broad categories — supervised learning, unsupervised learning, and reinforcement learning.Supervised learning is useful in cases where a property (label) is available for a certain dataset (training set), but is missing and needs to be predicted for other instances. Unsupervised learning is useful in cases where the challenge is to discover implicit relationships in a given unlabeled dataset (items are not pre-assigned). Reinforcement learning falls between these 2 extremes — there is some form of feedback available for each predictive step or action, but no precise label or error message. Since this is an intro class, I didn’t learn about reinforcement learning, but I hope that 10 algorithms on supervised and unsupervised learning will be enough to keep you interested.

机器学习算法可以分为3大类 - 监督学习,无监督学习和强化学习。监督学习在某个特定数据集(训练集)可用的属性(标签)的情况下非常有用,但缺失并需要可以预测其他情况。在挑战是发现给定的未标记的隐式关系的情况下,无监督学习是有用的数据集(项目不预先分配)。强化学习属于这两个极端之间 - 每种预测步骤或行为都有某种形式的反馈,但没有精确的标签或错误信息。由于这是一个介绍类,我没有学习强化学习,但是我希望10个有监督和无监督学习的算法足以让你感兴趣。

Supervised Learning监督学习

1.Decision Trees: A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance-event outcomes, resource costs, and utility. Take a look at the image to get a sense of how it looks like.

这里写图片描述

From a business decision point of view, a decision tree is the minimum number of yes/no questions that one has to ask, to assess the probability of making a correct decision, most of the time. As a method, it allows you to approach the problem in a structured and systematic way to arrive at a logical conclusion.

1.决策树:决策树是一种决策支持工具,它使用树状图或决策模型及其可能的后果,包括机会事件结果,资源成本和效用。看看图像,看看它是什么样子。

这里写图片描述

从业务决策的角度来看,决策树是人们必须要求的是/否问题的最小数量,大多数时候评估做出正确决策的可能性。作为一种方法,它使您能够以结构化和系统的方式来处理问题,以得出合乎逻辑的结论。

2.Naive Bayes Classification: Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features. The featured image is the equation — with P(A|B) is posterior probability, P(B|A) is likelihood, P(A) is class prior probability, and P(B) is predictor prior probability.

这里写图片描述

Some of real world examples are:
 To mark an email as spam or not spam
 Classify a news article about technology, politics, or sports
 Check a piece of text expressing positive emotions, or negative emotions?
 Used for face recognition software.

2.朴素贝叶斯分类朴素:贝叶斯分类器是一个简单的概率分类器的家庭基础上应用贝叶斯定理与强(天真)独立性假设之间的功能。P(A | B)为后验概率,P(B | A)为似然率,P(A)为类先验概率,P(B)为预测器的先验概率。

这里写图片描述

一些现实世界的例子是:
   将电子邮件标记为垃圾邮件或不垃圾邮件
   分类关于技术,政治或体育的新闻文章
   检查一段表达正面情绪或负面情绪的文字吗?
   用于人脸识别软件。

3. Ordinary Least Squares Regression: If you know statistics, you probably have heard of linear regression before. Least squares is a method for performing linear regression. You can think of linear regression as the task of fitting a straight line through a set of points. There are multiple possible strategies to do this, and “ordinary least squares” strategy go like this — You can draw a line, and then for each of the data points, measure the vertical distance between the point and the line, and add these up; the fitted line would be the one where this sum of distances is as small as possible.

这里写图片描述

Linear refers the kind of model you are using to fit the data, while least squares refers to the kind of error metric you are minimizing over.

3.普通最小二乘回归:如果你知道统计,你可能以前听说过线性回归。最小二乘法是进行线性回归的一种方法。您可以将线性回归看作是通过一组点拟合直线的任务。有多种可能的策略可以做到这一点,“普通最小二乘”策略是这样的 - 你可以绘制一条线,然后对每个数据点,测量点和线之间的垂直距离,并将它们相加; 拟合的线将是这个距离的总和尽可能小的那条线。

这里写图片描述

线性是指您用来拟合数据的模型类型,而最小二乘是指您正在最小化的错误度量类型。

4. Logistic Regression: Logistic regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution.

这里写图片描述

In general, regressions can be used in real-world applications such as:
 Credit Scoring
 Measuring the success rates of marketing campaigns
 Predicting the revenues of a certain product
 Is there going to be an earthquake on a particular day?

4. Logistic回归:Logistic回归是一个强大的统计方法,用一个或多个解释变量对二项结果进行建模。它通过使用逻辑函数估计概率来测量分类因变量和一个或多个自变量之间的关系,逻辑函数是累积逻辑分布。

这里写图片描述

一般来说,回归可以用于现实世界的应用程序,例如:
   信用评分
   衡量营销活动的成功率
   预测某种产品的收入
   有一天会发生地震吗?

5. Support Vector Machines: SVM is binary classification algorithm. Given a set of points of 2 types in N dimensional place, SVM generates a (N — 1) dimensional hyperplane to separate those points into 2 groups. Say you have some points of 2 types in a paper which are linearly separable. SVM will find a straight line which separates those points into 2 types and situated as far as possible from all those points.

这里写图片描述

In terms of scale, some of the biggest problems that have been solved using SVMs (with suitably modified implementations) are display advertising, human splice site recognition, image-based gender detection, large-scale image classification…

5.支持向量机:SVM是二进制分类算法。给定N维空间中2类点的集合,SVM生成(N-1)维超平面,将这些点分为2组。假设你在纸上有两种线性可分的点。支持向量机将找到一条直线,将这些点分成两类,并尽可能远离所有这些点。

这里写图片描述

就规模而言,使用支持向量机(经过适当修改的实现)已经解决的一些最大的问题是显示广告,拼接网站识别,基于图像的性别检测,大规模图像分类...

6. Ensemble Methods: Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a weighted vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, bagging, and boosting.
这里写图片描述

So how do ensemble methods work and why are they superior to individual models?
 They average out biases: If you average a bunch of democratic-leaning polls and republican-leaning polls together, you will get an average something that isn’t leaning either way.
 They reduce the variance: The aggregate opinion of a bunch of models is less noisy than the single opinion of one of the models. In finance, this is called diversification — a mixed portfolio of many stocks will be much less variable than just one of the stocks alone. This is why your models will be better with more data points rather than fewer.
 They are unlikely to over-fit: If you have individual models that didn’t over-fit, and you are combining the predictions from each model in a simple way (average, weighted average, logistic regression), then there’s no room for over-fitting.

6.集合方法:集合方法是学习算法,构建一组分类器,然后通过对其预测进行加权投票来对新数据点进行分类。原始的集合方法是贝叶斯平均,但是更新的算法包括纠错输出编码,装袋和提升。

这里写图片描述

那么集成方法是如何工作的,为什么它们比个人模型更优越呢?
   他们平均有偏见:如果你把一群民主倾向的民意调查和共和民主的民意调查结合在一起,你会得到一个平均的东西。
   他们减少了方差:一堆模型的总体意见比其中一个模型的单一意见少噪音。在金融方面,这就是所谓的多元化 - 许多股票的混合组合将变得比只有一个股票变量少得多。这就是为什么你的模型会更好,更多的数据点,而不是更少。
   他们不太可能过度适应:如果您有个别模型没有过度拟合,并且您将每个模型的预测以简单的方式(平均值,加权平均值,逻辑回归)相结合,那么就没有余地-配件。

Unsupervised Learning无监督学习

7. Clustering Algorithms: Clustering is the task of grouping a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups.

这里写图片描述

Every clustering algorithm is different, and here are a couple of them:
 Centroid-based algorithms
 Connectivity-based algorithms
 Density-based algorithms
 Probabilistic
 Dimensionality Reduction
 Neural networks / Deep Learning

7.聚类算法:聚类是将一组对象,使得在同一组(对象的任务簇)比的那些其它基团更彼此相似。

这里写图片描述

每个聚类算法是不同的,这里有几个:
   基于质心的算法
   基于连接的算法
   基于密度的算法
   概率
   降维
   神经网络/深度学习

8. Principal Component Analysis: PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

这里写图片描述

Some of the applications of PCA include compression, simplifying data for easier learning, visualization. Notice that domain knowledge is very important while choosing whether to go forward with PCA or not. It is not suitable in cases where data is noisy (all the components of PCA have quite a high variance).

8.主成分分析:PCA是一个统计过程,它使用正交变换将一组可能相关的变量的观察值转换成一组称为主成分的线性不相关变量的值。

这里写图片描述

PCA的一些应用包括压缩,简化数据,以便于学习,可视化。注意领域知识在选择是否与PCA一起前进时非常重要。在数据有噪声的情况下(PCA的所有组件具有相当高的方差),这是不合适的。

9. Singular Value Decomposition: In linear algebra, SVD is a factorization of a real complex matrix. For a given m * n matrix M, there exists a decomposition such that M = UΣV, where U and V are unitary matrices and Σ is a diagonal matrix.

这里写图片描述

PCA is actually a simple application of SVD. In computer vision, the 1st face recognition algorithms used PCA and SVD in order to represent faces as a linear combination of “eigenfaces”, do dimensionality reduction, and then match faces to identities via simple methods; although modern methods are much more sophisticated, many still depend on similar techniques.

9.奇异值分解:在线性代数中,奇异值分解是一个实数复数矩阵的分解。对于给定的  m×n 矩阵M,存在如下分解:M =UΣV,其中U和V是酉矩阵,Σ是对角矩阵。

这里写图片描述

PCA实际上是一种简单的SVD应用。在计算机视觉领域,第一代人脸识别算法采用PCA和SVD方法将人脸表示为“特征脸”的线性组合,进行维数降维,然后通过简单的方法将人脸匹配到身份; 虽然现代的方法要复杂得多,但许多仍然依靠类似的技术。

10. Independent Component Analysis: ICA is a statistical technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples. In the model, the data variables are assumed to be linear mixtures of some unknown latent variables, and the mixing system is also unknown. The latent variables are assumed non-gaussian and mutually independent, and they are called independent components of the observed data.

这里写图片描述

ICA is related to PCA, but it is a much more powerful technique that is capable of finding the underlying factors of sources when these classic methods fail completely. Its applications include digital images, document databases, economic indicators and psychometric measurements.
Now go forth and wield your understanding of algorithms to create machine learning applications that make better experiences for people everywhere.

10.独立分量分析:ICA是一种揭示隐藏因素的统计技术,隐藏因子是随机变量,测量或信号集合的基础。ICA为所观察到的多变量数据定义了一个生成模型,通常以样本的大型数据库的形式给出。在模型中,假定数据变量是一些未知潜变量的线性混合,混合系统也是未知的。潜变量被假定为非高斯和相互独立的,它们被称为观测数据的独立分量。

这里写图片描述

ICA与PCA相关,但是当这些经典方法完全失败时,它是一种更为强大的技术,能够找到源的潜在因素。其应用包括数字图像,文档数据库,经济指标和心理测量。
现在出发,运用您对算法的理解来创建机器学习应用程序,为各地的人们提供更好的体验。

【作者介绍】Bio: James Le is a Product Intern at New Story Charity and a Computer Science and Communication student at Denison University.

P.S原文链接

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值