【笔记】三张图读懂机器学习：基本概念、五大流派与九种常见算法

最新推荐文章于 2023-07-03 07:30:00 发布

江城暮

最新推荐文章于 2023-07-03 07:30:00 发布

阅读量778

点赞数

分类专栏： Note

本文链接：https://blog.csdn.net/qq_39087432/article/details/115409301

版权

Note 专栏收录该内容

7 篇文章

订阅专栏

【笔记】三张图读懂机器学习：基本概念、五大流派与九种常见算法

原文链接（附有完整翻译）：https://zhuanlan.zhihu.com/p/26512893

机器学习的常见应用：语音识别、图像检测、机器翻译、风格迁移等技术。

Chapter 1: A look at Machine learning

1.What is it?

Machines can “learn” by analyzing large amounts of data.

2.How does machine learning relate to artificial intelligence?

Machine learning is a category of research and algorithms focused on finding patterns in data and using those patterns to make predictions. Machine learning falls within the artificial intelligence (AI) umbrella, which in turn intersects with the broader field of knowledge discovery and data mining.

intersect: 贯穿；横贯

3.How machine learning works?

Select data

Split the data you have into three groups: training data, validation data, and test data.
Model data

Use the training data to build the model using the relevant features.
Validate model

Assess the model with your validation data.
Test model

Check performance of the validated model with your test data.
Use the model

Deploy the fully trained model to make predictions on new data.
Tune model（调优模型）

Improve performance of the algorithm with more data, different features, or adjusted parameters.

validate: 确证；验证

deploy: 部署；利用

tune: 曲调；调整

4.How machine learning fits in?

Traditional programming

The software engineer writes a program that solves a problem.

Data => Software engineer writes a procedure that tells the machine what to do to solve the problem. => Computer follows the procedure and generates a result.
Statistics

An analyst compares the relationships of variables.
Machine learning

A data scientist uses a training data set to teach the computer what to do, and the system carries out the tasks.

Big data => The machine learns to classify with the help of a training data set and tunes a specific alorithm to the desired classification. => The computer learns to identify relationships, trends, and patterns in the data.
Intelligent apps

Intelligent apps leverage the outputs of AI, as in this precision farming example that uses drone-based data collection.

carry out: 执行；履行；进行

leverage: use (something) to maximum advantage. 最大限度地利用，最优化使用

drone: 无人机

5.Machine learning in practice

For example:

Rapid 3D mapping and modeling
Enhanced profiling to mitigate risks
Predicting the top performers

profiling: （对个人心理、行为特征的）剖析研究（以评定或预测其在某领域潜力或认识某一种人）

mitigate: 减轻；使缓和

Chapter 2: A look at Machine learning evolution

For decades, individual “tribes” of artificial intelligence researchers have vied with one another for dominance. Is the time ripe now for tribes to collaborate? They may forced to, as collaboration and algorithm blending are the only ways to reach true artificial general intelligence (AGI). Here’s a look back at how machine learning methods have evolved and what the future may look like.

tribe: 部落；流派

vie: 竞争；相争

ripe: 成熟的；适合……的

blend: 使混合；使交融

What are the five tribes?

Symbolists
- Use symbols, rules, and logic to represent knowledge and draw logical inference
- Favored algorithm: Rules and decision trees, inverse deduction
Bayesians
- Assess the likelihood of occurrence for probabilisitic inference
- Favored algorithm: Naive Bayes (朴素贝叶斯) or Markov (马尔可夫)
Connectionists
- Recognize and generalize patterns dynamically with matrices of probabilistic, weighted neurons.
- Favored algorithm: Neural networks, backpropagation
Evolutionaries
- Generate variations and then assess the fitness of each for a given purpose
- Favored algorithm: Genetic programs (遗传算法)
Analogizers
- Optimize a functionin light of constraints (“going as high as you can while staying on the road”)
- Favored algorithm: Support vectors

inference: 推理

likelihood: 可能性

occurrence: 发生的事；事件；发生频率；存在

probabilisitic: 盖然性的；可能性的；概率的

matrices: matrix的复数

neuron: 神经元

generalize: 归纳

variation: 变化

（补充）

Pedro Domingos总结了五大流派目前存在的问题和解决方案，但他也重点强调，我们真正需要的是可以一次性解决这些所有问题的统一算法。

⭐️各学派的综合

表示

概率逻辑（例如马尔可夫逻辑模型）
带权公式，状态分布

评估

后验概率
用户定义的目标函数

最优化

公式发现：基因编程
权值学习：反向传播

Chapter 3: A look at Machine learning methods

Which machine learning algorithm should you use? A lot depends on the characteristics and the amount of the available data, as well as your training goals, in each particular use case. Avoid using the most complicated algorithms unless the end justifies more expensive means and resources. Here are some of the more common algorithms ranked by ease of use.

1.Decision trees

Decision tree analysis typically uses a hierarchy of variables or decision nodes that, when answered step by step, can classify a given customer as creditworthy or not, for example.

Advantages

Decision trees are useful when evaluating lists of distinct features, qualities, or characteristics of people, places, or things.
Use cases

Rule-based credit risk assessment, horse race performance prediction

distinct: 可辩别的；有区别的；不同的；明显的；清楚无误的；明确的

2.Support vector machines

Support vector machines classify groups of data with the help of hyperplanes

Advantages

Support vector machines are good for the binary classification of X versus other variables and are useful whether or not the relationship between variables is linear.
Use cases

News categorization, handwriting recognition

hyperplane: 超平面

3.Regression

Regression maps the behavior of a dependent variable relative to one or more dependent variables. In this example, logistic regression separates spam from non-spam text.

Advantages

Regression is useful for identifying continuous (not necessarily distinct) relationships between variables.
Use cases

Traffic flow analysis, email filtering

map: v.勾画；绘制

dependent variable: 因变量

spam: 垃圾邮件

4.Naive Bayes classification

Naive Bayes classifiers compute probabilities, given tree branches of possible conditions. Each individual feature is “naive” or conditionally independent of, and therefore does not influence, the others. For example, what’s the probability you would draw two yellow marbles in a row, given a jar of five yellow and red marbles total? The probability, following the topmost branch of two yellow in a row, is one in ten. Naive Bayes classifiers compute the combined, conditional probabilities of multiple attributes.

Advantages

Naive Bayes methods allow the quick classification of relevant items in small data sets that have distinct features.
Use cases

Sentiment analysis, consumer segmentation

classification: 分类器

marble: 弹子游戏

in a row: 连续地

segmentation: 分割；划分

5.Hidden Markow models

Observable Markov processes are purely deterministic–one given state always follows another given state. Traffic light patterns are an example.

Hidden Markov models, by contrast, compute the probability of hidden states occurring by analyzing observable data, and then estimating the likely pattern of future observation with the help of the hidden state analysis. In this example, the probability of high or low pressure (the hidden state) is used to predict the likelihood of sunny, rainy, or cloudy weather.

Advantages

Tolerates data variability and effective for recognition and prediction.
Use cases

Facial expression analysis, weather prediction

observable: 显著的；显式的；可观察的

deterministic: 确定性

6.Random forest

Random forest algorithms improve the accuracy of decision trees by using multiple trees with randomly selected subsets of data. This example reviews the expression levels of various genes associated with breast cancer relapse and computes a relapse risk.

Advantages

Random forest methods prove useful with large data sets and items that have numerous and sometimes irrelevant features.
Use cases

Customer churn analysis, risk assessment

subset: 子集

relapse: 重新恶化；复发

numerous: 许多的

churn: 搅；翻腾；流失

7.Recurrent neural networks

Each neuron in any neural network converts many inputs into single outputs via one or more hidden layers. Recurrent neural networks [RNNs] additionally pass values from step to step, making step-by-step learning possible. In other words, RNNs have a form of memory, allowing previous outputs to affect subsequent inputs.

Advantages

Recurrent neural networks have predictive power when used with large amounts of sequenced information.
Use cases

Image classification and captioning, political sentiment analysis

caption: 给（插图）加标题（或说明）

8.Long short-term memory & gated recurrent unit neural networks (门控循环单元神经网络)

Older forms of RNNs can be lossy. While these older recurrent neural networks only allow small amounts of older information to persist, newer long short-term memory (LSTM) and gated recurrent unit (GRU) neural networks have both long- and short-term memory. In other words, these newer RNNs have greater memory control, allowing previous values to persist or to be reset as necessary for many sequences of steps, avoiding “gradient decay” or eventual degradation of the values passed from step to step. LSTM and GRU networks make this memory control possible with memory blocks and structures called gates that pass or reset values as appropriate.

Advantages

Long short-term memory and gated recurrent unit neural networks have the same advantages as other recurrent neural networks and are more frequently used than other recurrent neural networks because of their greater memory capablilities.
Use cases

Natural language processing, translation

lossy: （压缩）有损的

persist: 坚持；持续；延续

gradient: （数学）梯度；斜率

decay: 衰减

degradation: 降级；退化

9.Convolutional neural networks (卷积神经网络)

Convolutions are blends of weights from a subsequent layer that are used to label the output layer.

Advantages

Convolutional neural networks are most useful with very large data sets, large numbers of features, and complex classification tasks.
Use cases

Image recognition, text to speech, drug discovery