机器学习术语_机器学习术语神秘化。

机器学习术语

Till this day, my favorite definition of a Machine is ; something that makes work easier. At its simplest, a machine is an invention that does a job better and faster and more powerfully than a human being. With regards to machine learning, this is the why. There is a need to preform a task more efficiently and at a faster rate. What is the task? to make decisions. Hence what then is Machine learning??

直到今天,我最喜欢的机器定义是; 使工作更轻松的东西。 最简单地说,机器是一项比人类更好,更快,更强大地完成工作的发明。 关于机器学习,这就是原因。 需要更有效和更快地执行任务。 任务是什么? 做出决定。 因此,机器学习又是什么呢?

Before I answer that, a quick introduction. In my journey to becoming a data scientist, I found myself having to learn a lot of new terminologies. Even certain terms that already existed in my vocabulary, took on a new meaning. A lot of these terminologies can be wordy and somewhat intimidating. My aim in this write up is to provide as much as possible layman definitions for the basic terminologies associated with machine learning that I have come across.

在我回答之前,先进行快速介绍。 在成为数据科学家的过程中,我发现自己必须学习许多新术语。 甚至我词汇中已经存在的某些术语也具有新的含义。 这些术语中的许多术语可能有些罗word,有些令人生畏。 我写这篇文章的目的是为我遇到的与机器学习相关的基本术语提供尽可能多的外行定义。

Data science in its essence is the skill of using information available to gain insight and improve processes. It does this using a blend of machine learning algorithms, statistics, business intelligence, and programming. It aims to discover patterns from the raw data, which in turn provides insights into any processes.

数据科学从本质上讲就是使用可用信息来获得洞察力和改进流程的技能。 它结合了机器学习算法,统计数据,商业智能和编程来完成此任务。 它旨在从原始数据中发现模式,进而提供对任何流程的见解。

Image for post

Now back to the question, what is machine learning?

现在回到问题,什么是机器学习?

Machine learning is a field in technology that allows machine to learn from data and self improve. Machine-learning algorithms use statistics and other mathematical tools to find patterns in data.

机器学习是技术领域,允许机器从数据中学习并自我完善。 机器学习算法使用统计数据和其他数学工具来查找数据模式。

Machine Learning can be separated into three groups:

机器学习可以分为三类:

Supervised learning, is a type of machine learning, where data is labeled to tell the machine exactly what patterns it should look for. Under the umbrella of supervised learning:

监督学习是机器学习的一种类型,其中标记数据以告知机器确切应寻找的模式。 在监督学习的保护下:

  • Classification: In classification tasks, the machine learning program must draw a conclusion from observed values and determine to

    分类 :在分类任务中,机器学习程序必须从观察值得出结论并确定

    what category new observations belong

    新观测值属于什么类别

  • Regression: In regression tasks, the machine learning program must estimate and understand the relationships among variables.Regression analysis focuses on one dependent variable and a series of other changing variables.

    回归 :在回归任务中,机器学习程序必须估计并了解变量之间的关系。回归分析着重于一个因变量和一系列其他变化的变量。

  • Forecasting: Forecasting is the process of making predictions about the future based on the past and present data,

    预测 :预测是根据过去和现在的数据对未来进行预测的过程,

Unsupervised learning, here the data has no labels. The machine just looks for whatever patterns it can find.Under the umbrella of Unsupervised learning:

无监督学习,这里的数据没有标签。 机器只会寻找可以找到的任何模式。在无监督学习的保护下:

  • Clustering: Clustering involves grouping sets of similar data (based on defined criteria).After which you can analyze and find patterns

    聚类 :聚类涉及将相似数据集(基于定义的标准)进行分组,然后您可以分析和查找模式

  • Dimension reduction: Dimension reduction reduces the number of variables being considered to find the exact information required.

    降维 :降维减少了为了找到所需的确切信息而要考虑的变量数量。

Reinforcement learning, learns by trial and error to achieve a clear objective. It tries out lots of different things and is rewarded or penalized depending on whether its behaviors help or hinder it from reaching its objective.

强化学习,通过反复试验来学习,以达到明确的目标。 它尝试许多不同的事物,并根据其行为是帮助还是阻碍其实现目标而受到奖励或惩罚。

Machine learning Algorithm

机器学习算法

An ‘algorithm’ is a series of steps to complete a task.

算法”是完成任务的一系列步骤。

An algorithm in machine learning is a procedure that is run on data to create a machine learning “model.

机器学习中的算法是在数据上运行以创建机器学习模型的过程。

Machine learning algorithms perform “pattern recognition.” Algorithms “learn” from data, or are “fit” on a dataset.

机器学习算法执行“ 模式识别” 。 算法从数据中“ 学习 ”,或“ 适合 ”数据集。

A “Model” in machine learning is the output of a machine learning algorithm run on data.

机器学习中的“ 模型 ”是在数据上运行的机器学习算法的输出。

A model represents what was learned by a machine learning algorithm.

模型代表通过机器学习算法学习到的内容。

流行的机器学习算法 (Popular Machine Learning Algorithms)

  • Linear regression (Supervised Learning/Regression): Linear regression is the most basic type of regression. Simple linear regression allows us to understand the relationships between two continuous variables.

    线性回归 (监督学习/回归):线性回归是最基本的回归类型。 简单的线性回归使我们能够理解两个连续变量之间的关系。

  • Logistic regression (Supervised learning — Classification): Logistic regression focuses on estimating the probability of an event occurring based on the previous data provided. It is used to cover a binary dependent variable, that is where only two values, 0 and 1, represent outcomes.

    Logistic回归 (监督学习-分类): Logistic回归专注于根据提供的先前数据估算事件发生的概率。 它用于覆盖二进制因变量,即只有两个值0和1表示结果。

  • Naive Bayes (Supervised Learning — Classification): The Naïve Bayes classifier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a class/category, based on a given set of features, using probability.

    朴素贝叶斯 (监督学习-分类):朴素贝叶斯分类器基于贝叶斯定理,将每个值分类为与任何其他值无关。 它使我们能够使用概率基于给定的一组特征来预测类别/类别。

  • K-nearest neighbor algorithm (Supervised Learning): The Neighbor algorithm estimates how likely a data point is to be a member of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in.

    K近邻算法 (监督学习): Neighbor算法估计数据点成为一个或另一个组的成员的可能性。 它实质上是查看单个数据点周围的数据点,以确定其实际位于哪个组中。

  • Decision trees (Supervised Learning — Classification/Regression): A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable and each branch is the outcome of that test.

    决策树 (监督学习-分类/回归):决策树是类似于流程图的树结构,使用分支方法来说明决策的每种可能结果。 树中的每个节点代表对特定变量的测试,每个分支都是该测试的结果。

  • Random Forests (Supervised Learning — Classification/Regression): Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction

    随机森林 (监督学习-分类/回归):随机森林,顾名思义,是由大量独立的决策树组成的 。 随机森林中的每棵树都会发出类别预测,而投票数最多的类别将成为我们模型的预测

  • Support Vector Machines (Supervised Learning — Classification); Support Vector Machine algorithms are supervised learning models that analyze data used for classification and regression analysis. They essentially filter data into categories, which is achieved by providing a set of training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.

    支持向量机 (监督学习-分类); 支持向量机算法是有监督的学习模型,可以分析用于分类和回归分析的数据。 它们实质上将数据过滤到类别中,这是通过提供一组训练示例来实现的,每组训练示例都标记为属于两个类别中的一个或另一个。 然后,该算法将构建一个将新值分配给一个类别或另一个类别的模型。

  • K Means Clustering Algorithm (Unsupervised Learning — Clustering)

    K均值聚类算法 (无监督学习—聚类)

    The algorithm works by finding groups within the data, with the number of groups represented by the variable K. It then works iteratively to assign each data point to one of K groups based on the features provided.

    该算法通过查找数据中的组(用变量K表示的组数)进行工作。然后,该算法根据提供的功能迭代地将每个数据点分配给K个组之一。

  • Artificial Neural Networks (Reinforcement Learning) : An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of which connects to layers on either side. ANNs are inspired by biological systems, such as the brain, and how they process information. ANNs are essentially a large number of interconnected processing elements, working in unison to solve specific problems.

    人工神经网络 (强化学习):人工神经网络(ANN)包括布置在一系列层中的“单元”,每个单元连接到任一侧的层。 人工神经网络受到诸如大脑之类的生物系统以及它们如何处理信息的启发。 人工神经网络本质上是大量相互连接的处理元素,它们协同工作以解决特定问题。

Other useful terminologies when talking about machine learning include:

在谈论机器学习时,其他有用的术语包括:

Ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks. Each individual classifier is weak, but when combined with others, can produce excellent results.

集成学习方法 ,结合多种算法为分类,回归和其他任务生成更好的结果。 每个单独的分类器都很弱,但是与其他分类器结合使用时,可以产生出色的结果。

Artificial Intelligence (AI) refers to machines that can learn, reason, and act for themselves. They can make their own decisions when faced with new situations, in the same way that humans and animals can.

人工智能 (AI)是指可以自行学习,推理并采取行动的机器。 面对新的情况,他们可以像人类和动物一样做出自己的决定。

Data are characteristics or information that are collected through observation

数据是通过观察收集的特征或信息

Data Cleaning refers to the steps needed to take to prepare you data for use. Here you detect incomplete, incorrect, inaccurate or irrelevant data from your dataset and then you choose either to replace, modify, delete or coarse the data as needed

数据清理是指准备使用数据所需采取的步骤。 在这里,您可以从数据集中检测不完整,不正确,不准确或不相关的数据,然后根据需要选择替换,修改,删除或粗化数据

Exploratory data analysis (EDA):This refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

探索性数据分析 (EDA):这是对数据进行初步调查以发现模式,发现异常情况,检验假设并在汇总统计信息和图形表示的帮助下检查假设的关键过程。

Training data is the main and most important data which helps machines to learn and make the predictions. This data set is used by machine learning engineer to develop your algorithm and more than 70% of your total data used in the project.

训练数据是主要和最重要的数据,可帮助机器学习并做出预测。 机器学习工程师使用此数据集来开发您的算法,并在项目中使用总数据的70%以上。

Validation Data is the second type of data set used to validate the machine learning model before final delivery of project. ML model validation is important to ensure the accuracy of model prediction to develop a right application. Using this type of data helps to know whether model can correctly identify the new examples or not.

验证数据是第二种数据集,用于在最终交付项目之前验证机器学习模型。 ML模型验证对于确保模型预测的准确性以开发正确的应用程序非常重要。 使用此类数据有助于了解模型是否可以正确识别新示例。

Testing data is the final and last type of data helps to check the prediction level of machine learning and AI model.

测试数据是最终的数据类型,也是最后一种数据类型,它有助于检查机器学习和AI模型的预测水平。

The world of machine learning and data science is vast and ever growing. It is easy to view it as an insurmountable endeavor. I’ll like to encourage anyone at wishing to take the path down this road not to be intimidated. A lot of these terminologies only sound incomprehensible but once you discover its very essence, everything becomes clear. Again, good things take time and great ones take even more time, so do not weary and keep pushing forward.

机器学习和数据科学的世界广阔且不断增长。 很容易将其视为无法克服的努力。 我想鼓励任何想走这条路的人不要被吓到。 这些术语中的许多听起来仅是难以理解的,但是一旦发现其本质,一切就变得清晰起来。 同样,美好的事物需要时间,伟大的事物需要更多的时间,因此不要疲倦并继续前进。

翻译自: https://medium.com/@chibuzo.ugonabo/machine-learning-terminologies-demystified-6aa1aa81a57b

机器学习术语

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值