开始使用python掌握机器学习的旅程

最新推荐文章于 2024-10-08 12:37:10 发布

weixin_26704853

最新推荐文章于 2024-10-08 12:37:10 发布

阅读量503

点赞数

文章标签： python 机器学习人工智能

原文链接：https://towardsdatascience.com/starting-your-journey-to-master-machine-learning-with-python-d0bd47ebada9

版权

关于ML (All About ML)

Machine Learning, Deep Learning, Data Science, and Artificial Intelligence (AI) are some of the most used buzz-words today. The popularity of these subjects is on the rise with each passing day. Everyone is trying to jump on the hype train to explore these fields. According to fortune, the statistics say that the hirings for AI specialists have grown by 74% over the last 4 years. Artificial Intelligence is regarded as the “Hottest” job of the present generation. This may lead to many questions in the minds of the viewers.

中号 achine学习，深入学习，数据科学和人工智能(AI)是一些最常用的时髦词今天。这些主题的受欢迎程度每天都在上升。每个人都在尝试大肆宣传这些领域。据《财富》杂志统计，人工智能专家的招聘在过去四年中增长了74％。人工智能被视为当代的“最热门”工作。这可能在观众的脑海中引发许多问题。

What makes it so popular? What are these fields anyways? What is machine learning? How do I get started? Why python?

是什么让它如此受欢迎？这些领域到底是什么？什么是机器学习？我该如何开始？为什么是python？

I will try to answer all these questions in today’s article while explaining in detail about how you can get started with python and Machine Learning (ML). Then we will understand how you can tune up and master your machine learning skills.

我将在今天的文章中尝试回答所有这些问题，同时详细说明如何开始使用python和机器学习(ML)。然后，我们将了解您如何调整和掌握您的机器学习技能。

Note: This will be part-1 of my “All About Machine Learning” course. However, each subsequent parts will be standalone parts. You can read the series in any order as per your convenience. I will try to cover the basics and most of the machine learning algorithms in the upcoming articles. To view other parts of the series you can click here.

注意：这将是我的“关于机器学习的全部”课程的第1部分。但是，每个后续部分将是独立的部分。您可以根据需要以任何顺序阅读该系列。在接下来的文章中，我将尝试介绍基础知识和大多数机器学习算法。要查看系列的其他部分，请单击此处。

为什么AI如此受欢迎？ (Why is AI so popular?)

Image for post — Photo by Benjamin Davies on Unsplash

Artificial Intelligence is one of the fastest-growing fields today. The advancements in AI are expanding at a fast pace. There is no lack of open positions as well as career opportunities. Everyone is hyped about how AI is going to be the next big thing. As stated by professor Andrew NG, one of the most prominent figures for modern AI—

人工智能是当今增长最快的领域之一。人工智能的进步正在Swift发展。不缺少职位空缺和职业机会。每个人都对AI将如何成为下一件大事大肆宣传。正如吴安德教授(Andrew NG)所说，现代AI最杰出的人物之一-

“Artificial Intelligence is the new electricity.”

“人工智能是新的电力。”

There are so many expectations for the field of AI in the modern day due to the advancements in technology and the abundance of data. We have higher qualities of graphics processing units and better technologies to compute complex processes.

由于技术的进步和数据的丰富，对现代AI领域的期望值很高。我们拥有更高质量的图形处理单元和更好的技术来计算复杂过程。

这些领域到底是什么？ (What are these fields anyways?)

Artificial Intelligence is a vast field. The topics like Machine Learning, Data Science, statistics, natural language processing, all come under Artificial Intelligence. Deep Learning is a subset of Machine Learning.

人工智能是一个广阔的领域。机器学习，数据科学，统计，自然语言处理等主题都属于人工智能。深度学习是机器学习的子集。

Artificial Intelligence is —

人工智能是-

“The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”

“能够执行通常需要人类智力的任务的计算机系统的理论和开发，例如视觉感知，语音识别，决策和语言之间的翻译。”

To learn more about artificial intelligence and its analogy to the Universe you can refer here. In this article, we will focus more on Machine Learning. I will try to give a more intuitive approach to understanding Machine Learning.

要了解有关人工智能及其与宇宙的类比的更多信息，请在此处参考。在本文中，我们将重点放在机器学习上。我将尝试给出一种更直观的方法来理解机器学习。

什么是机器学习？ (What is Machine Learning?)

Machine Learning is the ability of a program to learn and improve its efficiency automatically without being explicitly programmed to do so. This means that given a training set you can train the machine learning model and it will understand how a model exactly works. Upon being tested on a test set, validation set, or any other unseen data, the model will still be able to evaluate the particular task.

机器学习是程序自动学习和提高效率的能力，而无需明确地编程即可。这意味着，给定训练集，您可以训练机器学习模型，并且它将了解模型的确切工作方式。在测试集，验证集或任何其他看不见的数据上进行测试后，模型仍将能够评估特定任务。

Let us understand this with a simple example. Assume we have a dataset of 30,000 emails out of which some are classified as spam and some are classified as not spam. The machine learning model will be trained on the dataset. Once the training process is complete, we can test it with a mail that was not included in our training dataset. The machine learning model can make predictions on the following input and classify it correctly if the input e-mail is spam or not.

让我们用一个简单的例子来理解这一点。假设我们有30,000个电子邮件的数据集，其中有些被归类为垃圾邮件，有些被归类为非垃圾邮件。机器学习模型将在数据集上进行训练。培训过程完成后，我们可以使用培训数据集中未包含的邮件对其进行测试。机器学习模型可以对以下输入进行预测，并且无论输入电子邮件是否为垃圾邮件，都可以对其进行正确分类。

There are three main types of machine learning methods. We will discuss each of these methods. I will then state a few examples and applications for each of these methods.

机器学习方法主要有三种。我们将讨论每种方法。然后，我将为每种方法陈述一些示例和应用程序。

1.监督学习- (1. Supervised Learning —)

This is the method of training the model with specifically labeled datasets. The datasets can either be a binary classification or multi-class classification. These datasets will have labeled data specifying the correct and incorrect options or a range of options. The model is pre-trained with supervision i.e. with the help of these labeled data. This learning can be categorized into two types of algorithms —

这是使用专门标记的数据集训练模型的方法。数据集可以是二进制分类也可以是多分类。这些数据集将带有标记的数据，这些数据指定正确和不正确的选项或选项范围。该模型在监督下即在这些标记的数据的帮助下进行了预训练。这种学习可以分为两种算法-

Classification: These algorithms are preferred when the output has a choice or a particular category. The example of email spam filtering can be considered as a classification problem.
分类：当输出具有选择或特定类别时，首选这些算法。电子邮件垃圾邮件过滤的示例可以视为分类问题。
Regression: The algorithms are preferred when the output variable has a real value. An example of this can be predicting the house prices for a particular location.
回归：当输出变量具有实数值时，首选算法。例如，可以预测特定位置的房价。

Algorithms: Regression algorithms (linear regression), decision trees, random forests, and classification algorithms like K-Nearest Neighbors (KNN), Support Vector Machines (SVM), logistic regression, Naive Bayes.

算法：回归算法(线性回归)，决策树，随机森林和分类算法，例如K最近邻(KNN)，支持向量机(SVM)，逻辑回归，朴素贝叶斯。

Applications: Email spam filtering, classification of tumors (if benign or malignant), classification of user reviews into positive or negative reviews.

应用程序：电子邮件垃圾邮件过滤，肿瘤分类(良性或恶性)，用户评论分为正面评论或负面评论。

2.无监督学习— (2. Unsupervised Learning —)

Unsupervised learning is the training of the model on an unlabeled dataset. This means the model is given no prior information. It trains itself by the grouping of similar characteristics and patterns together. An example of unsupervised learning can be categorizing of dogs and cats. The data given to us will be an unlabeled dataset with images of dogs and cats. The unsupervised algorithm will find similarities in patterns and group dogs and cats separately without the specification of the type of data. There are two main types of clustering algorithms —

无监督学习是在无标签数据集上训练模型。这意味着该模型没有任何先验信息。它通过将相似的特征和模式组合在一起来进行自我训练。无监督学习的一个例子是对猫和狗的分类。提供给我们的数据将是一个没有标签的数据集，其中包含狗和猫的图像。无监督算法将在模式和狗和猫的分组中找到相似之处，而无需指定数据类型。群集算法主要有两种类型-

Clustering: Arrangement of similar entities into groups of clusters. An example of this grouping categories like cats and dogs into clusters as mentioned previously. Another example of this is identifying the cancer stages based on specific data.
聚类：将相似的实体排列成聚类组。如前所述，将猫和狗等类别归为一组的示例。另一个例子是根据特定数据确定癌症分期。
Association: Associating similar patterns between 2 or more classes/users. An example of this is recommendation systems where if a particular person watches a movie of a particular genre, the user is given recommendations based on what other users who watched the same movie preferred to watch. Another example of this is when amazon recommends that buyers who bought a particular item also prefer to buy other similar items.
关联：在2个或多个班级/用户之间关联相似的模式。这样的示例是推荐系统，其中，如果特定的人观看特定类型的电影，则根据观看同一电影的其他用户喜欢观看的内容为用户提供推荐。亚马逊的另一个例子是，当亚马逊建议购买特定商品的买家也喜欢购买其他类似商品时。

Algorithms: K-means clustering, principal component analysis (PCA), singular value decomposition (SVD), hierarchical clustering.

算法： K-均值聚类，主成分分析(PCA)，奇异值分解(SVD)，分层聚类。

Applications: Recommendation systems in amazon, Netflix, YouTube, and other digital platforms, friend suggestions on Facebook, anomaly detection.

应用程序：亚马逊，Netflix，YouTube和其他数字平台中的推荐系统，Facebook上的朋友建议，异常检测。

3.强化学习- (3. Reinforcement Learning —)

According to Wiki —

根据Wiki —

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

强化学习(RL)是机器学习的一个领域，与软件代理应如何在环境中采取行动以使累积奖励的概念最大化有关。除了监督学习和无监督学习，强化学习是三种基本的机器学习范式之一。

Reinforcement learning is a kind of hit and try method model. This is the method where the model learns with repeated failures. When a model does not achieve the desired result then the model will re-train. This can be applied to concepts like playing chess where after playing millions of games the model will be able to learn appropriate patterns and moves. A more simpler example would be a game of tic tac toe. The model can learn every single path to victory after a few hundred games and once perfectly trained, it will never lose a game.

强化学习是一种即插即用的方法模型。这是模型学习反复失败的方法。当模型无法达到期望的结果时，模型将重新训练。这可以应用于像下棋的概念，在玩了数百万游戏之后，模型将能够学习适当的模式和动作。一个更简单的示例是井字游戏。经过几百场比赛，该模型可以学习每条通往胜利的道路，并且经过完美训练后，它永远不会输掉一场比赛。

Algorithms: Policy based, value based, and model based reinforcement learning algorithms.

算法：基于策略，基于价值和基于模型的强化学习算法。

Applications: Model learning to play games like flappy bird, robotics in industrial automation, traffic control, deep learning based reinforcement learning.

应用程序：进行模型学习，如玩飞鸟，工业自动化中的机器人，交通控制，基于深度学习的强化学习。

我该如何开始？ (How do I get started?)

If you were curious about how to start your machine learning journey then the answer is quite simple. The best way to get started with machine learning is to explore and dig deeper into the various topics surrounding machine learning. It is important to understand if you are interested in and passionate about the field. Machine Learning requires programming skills, mathematical knowledge, and most importantly, the willingness and persistence to learn. There are tons of resources both free and paid from which you can gain a lot of knowledge.

如果您对如何开始机器学习之旅感到好奇，那么答案很简单。机器学习入门的最佳方法是探索并更深入地研究围绕机器学习的各个主题。重要的是要了解您是否对该领域感兴趣并充满热情。机器学习需要编程技能，数学知识，最重要的是需要学习的意愿和毅力。有大量的免费和付费资源，您可以从中获得很多知识。

I would highly recommend viewers check out the various machine learning videos online on YouTube. Checking out free code and boot camps is also a great idea. I think it is a great idea to check out the free stuff before diving into the paid courses online. Once you find out that you really like machine learning and have a passionate interest, I would heavily recommend learning Python first and then taking up the Machine Learning Course from Stanford University offered by Coursera by Andrew NG. There are tons of online courses to check out. Just review the courses accordingly and engage in whichever machine learning course suits you best.

我强烈建议观众在YouTube上在线观看各种机器学习视频。签出免费代码和新手训练营也是一个好主意。我认为在进入在线付费课程之前先查看免费内容是一个好主意。一旦您发现自己真的很喜欢机器学习并对它产生了浓厚的兴趣，我会强烈建议您首先学习Python，然后再参加由Andrew NG的Coursera提供的斯坦福大学的机器学习课程。有大量的在线课程需要结帐。只需对课程进行相应的审查，然后参加最适合您的机器学习课程即可。

为什么是Python？ (Why Python?)

Python is an object oriented, high level programming language that was released way back in 1991. Python is highly interpretable and efficient. Simply put — Python is amazing. I initially started out with languages like C, C++, and Java. When I finally encountered python, I found it to be quite elegant, simple to learn, and easy to use. Python is the best way for anyone, even people with no prior experience with programming or coding languages to get started with machine learning. In spite of having some flaws like being considered a “slow” language, python is still one of the best languages for AI and machine learning.

Python是一种面向对象的高级编程语言，于1991年发布。Python具有高度的可解释性和效率。简而言之-Python很棒。我最初以C，C ++和Java之类的语言开始。当我最终遇到python时，我发现它非常优雅，易于学习且易于使用。 Python是任何人的最佳方法，即使没有使用编程或编码语言的经验的人也可以开始使用机器学习。尽管存在一些缺陷，例如被认为是“慢”语言，但是python仍然是AI和机器学习的最佳语言之一。

The main reasons why Python is so popular for machine learning despite other languages like R is as follows —

尽管R等其他语言，Python之所以在机器学习中如此受欢迎的主要原因如下：

As mentioned previously python is very simple and consistent.
如前所述，python非常简单且一致。
The rapid increase in popularity in comparison to other programming languages.
与其他编程语言相比，流行度Swift提高。
Extensive resources with respect to a wide range of libraries and frameworks. We will discuss this in further detail in the next part of this series.
有关各种库和框架的大量资源。我们将在本系列的下一部分中对此进行更详细的讨论。
Versatility and platform independence. This means python can import essential modules built in other programming languages as well.
多功能性和平台独立性。这意味着python也可以导入用其他编程语言构建的基本模块。
Great community and continuous updates. The python community in general is filled with amazing people and constant updates are made to improve python.
很棒的社区和不断更新。总体来说，python社区充满了令人惊奇的人，并且不断进行更新以改进python。

To get started with python, you can download it from here.

要开始使用python，可以从此处下载。

结论： (Conclusion:)

I hope all of you enjoyed the read. This will be part-1 of my “all about machine learning” tutorial series. I plan on covering all the topics with respect to machine learning. In the next part of this series, we will look over all the basics of python and its libraries. All of these will be standalone parts and you can view them at any time. Check out the future parts for this series here. Thank you all for sticking on till the end and wish you all a wonderful day.

我希望大家都喜欢阅读。这将是我的“关于机器学习的全部”教程系列的第1部分。我计划涵盖有关机器学习的所有主题。在本系列的下一部分中，我们将研究python及其库的所有基础知识。所有这些都是独立的部分，您可以随时查看它们。在这里查看本系列的未来部分。谢谢大家坚持到底，并祝大家有美好的一天。