ai人工智能的数据服务_AI和数据科学的傻瓜与同学聊天

ai人工智能的数据服务

These are my answers to questions about AI and its business practice, discussed among ~200 of my fellow classmates from IIT Bombay. They are modified slightly to protect privacy, to remove specific references and for better narration. This is the first part of a series of these posts. The second part discusses insights about ‘Why Doesn’t AI Work?’ and the third about ‘AI Hacks That Do Work.’ I will keep editing this header to include links to other parts.

这些是我对AI及其商业实践问题的解答,在IIT孟买的200余位同学中进行了讨论。 对它们进行了稍微的修改以保护隐私,删除特定的参考文献并进行更好的叙述。 这是这些文章系列的第一部分。 第二部分讨论有关“ AI为什么不起作用? ”和“有效的AI骇客”。 '我将继续编辑此标题以包括指向其他​​部分的链接。

数据科学 (Data Science)

At the simplest level data science is just that — a scientific analysis of data. In the fourth grade, when we all learned how to make simple graphs, we had become data scientists already.

在最简单的层面上,数据科学就是这样-对数据的科学分析。 在四年级时,当我们都学习了如何制作简单图形时,我们已经成为数据科学家。

You would think that I am exaggerating to make a point. Well, lookup Microsoft corporate strategy, and its focus on a new product called PowerBI — they are making a massive push on it as a way to cement Windows based systems in enterprises. Then look up demos they have for PowerBI. There is plenty available on YouTube. These demos talk about dash-boarding and how the extremely powerful software can visualize your darkest, deepest data to make excellent plots. And then tell me if a fourth grader can’t make those dashboards.

您可能会认为我在夸大一点。 好吧,查找微软公司的战略,并将其重点放在名为PowerBI的新产品上,他们正在大力推动它,以巩固企业中基于Windows的系统。 然后查找PowerBI的演示。 YouTube上有很多可用的功能。 这些演示讨论仪表板,以及功能强大的软件如何可视化您最黑暗,最深的数据,以绘制出精美的图。 然后告诉我四年级学生是否无法制作这些仪表板。

Of course, there is a lot more to PowerBI than making bar graphs, but the point is that even at that simplest level data science can be very powerful. Add mean and standard deviation to it, and you have covered almost everything in the world of business analytics. Sure, the size of data has bloated recently, particularly because of a take off in deployment of sensors and embedded devices (IoT). Still, your biggest intellectual problem as a data analyst is how to clean the various formats of data, rather than how to process it.

当然,除了制作条形图外,PowerBI还有很多其他功能,但要点是,即使在最简单的水平上,数据科学也可以非常强大。 在其中添加均值和标准差,您几乎涵盖了业务分析领域中的所有内容。 当然,最近数据的大小已经膨胀,特别是因为传感器和嵌入式设备(IoT)的部署取得了腾飞。 不过,作为数据分析师,您最大的智力问题是如何清除各种格式的数据,而不是如何处理它们。

人工智能 (Artificial Intelligence)

There is a small portion of data science world that focuses on using data to write better programs. Here is the intuition behind it. The simplest programs are ‘Do X’. They are very powerful and make up the foundation of the programming world.

数据科学界有一小部分致力于使用数据编写更好的程序。 这是其背后的直觉。 最简单的程序是“ Do X”。 它们非常强大,并构成了编程世界的基础。

Smarter programs say ‘If A do X else do Y.’ I don’t have to explain this, except to say that almost all programming in the last century, and most of programming in this, is as simple as that. Rules engines, and the so-called expert systems are but a set of chained, nested and looped if-else statements.

聪明的程序会说“如果A做X,否则做Y”。 我不必解释这一点,只不过要说上个世纪的几乎所有编程,以及其中的大多数编程就是这么简单。 规则引擎和所谓的专家系统不过是一组链接,嵌套和循环的if-else语句。

The breakthrough behind the field of artificial intelligence started with a simple question — can a machine automatically figure out the condition A in that statement and write these rules itself. We can convert ‘If A do X else do Y’ to ‘c = Cx if A else Cy’ and then depending on the value of c we can perform X or Y. Suddenly this is as simple as a classification problem. If we are given a set of pre-labelled data points, can we find a model, A, which can classify a new data point to Cx or Cy (or one of a number of classes in the generalized case)?

人工智能领域的突破始于一个简单的问题-机器可以自动找出该语句中的条件A并自己编写这些规则。 我们可以将'If A do X else do Y'转换为'c = Cx if A else Cy',然后根据c的值,我们可以执行X或Y。突然之间,这就像分类问题一样简单。 如果给了我们一组预先标记的数据点,我们是否可以找到一个模型A,该模型可以将新数据点分类为Cx或Cy(或广义情况下的多个类之一)?

If we can do that then we don’t have worry about the if-else statements. All we need to do is to get that set of pre-labelled data points, also called training data, run the machine, and go home. We have learnt so many techniques to do classification from the fields of algebra and statistics — Naïve Bayes, logistic regression, decision trees, and what not.

如果我们能够做到这一点,那么我们就不必担心if-else语句。 我们要做的就是获取那组预先标记的数据点(也称为训练数据),运行机器并回家。 我们已经从代数和统计领域学到了很多分类方法,这些方法包括朴素贝叶斯,逻辑回归,决策树以及其他什么都不做。

Congratulations! If you have ever fit a line to some data, you have programmed an artificially intelligent system.

恭喜你! 如果您曾经对某些数据进行过拟合,则已经对人工智能系统进行了编程。

为什么这很重要? (Why is this important?)

So, what’s the big deal? Three things — one, this is a big deal by itself. You have no idea how many artificially intelligent systems seldom use anything more than probabilities. If you want to get more complexity, a popular machine learning algorithm is called Random Forest. It involves making decision trees based on multiple samples of the data, hence the forest, and then taking the mode or the median of the decisions by each of the trees. It’s pure statistics, nothing fancy. However, this is now empowering almost every aspect of human life. Turn anywhere, and it is likely that an intelligent machine like this is helping you along.

那么,有什么大不了的? 三件事-一,这本身就是一件大事。 您不知道有多少个人工智能系统很少使用除概率以外的任何功能。 如果要提高复杂性,一种流行的机器学习算法称为随机森林。 它涉及根据数据的多个样本(即森林)创建决策树,然后根据每棵树的决策模式或中位数进行决策。 纯粹是统计数据,没有幻想。 但是,这现在正在赋予人类生活几乎所有方面的力量。 随处转动,这样的智能机器很可能会帮助您。

神经网络 (Neural Networks)

Second, they figured something called a neural network. Each node in this network is essentially a weighted sum. You take a set of inputs, you weigh each of them and you sum them up. Simple.

其次,他们想出了一种叫做神经网络的东西。 该网络中的每个节点本质上都是一个加权和。 您需要一组输入,对每个输入进行权重,然后进行汇总。 简单。

Let’s make it real. In the fourth year at college one of my friends John (name changed) was really trying to impress this girl, Jane (name changed), who was a co-volunteer at a non-profit called Magic Bus. Magic Bus works for under-privileged children and organizes various camps and events in its efforts. John’s decision tree to go or not to go to an event was simple — if she was coming, John would brave everything and go. Otherwise if the event was a party (vs. a hike or a camp) and it was not raining, John would go.

让我们实现它。 在大学四年级,我的一个朋友约翰(名字更改)确实想打动这个女孩,简(名字更改),她是一家名为Magic Bus的非营利组织的联合志愿者。 Magic Bus为贫困儿童服务,并组织各种营地和活动。 John决定是否参加某项活动的决策树很简单-如果她要来,John会勇敢地勇往直前。 否则,如果活动是聚会(而不是远足或露营),并且没有下雨,约翰会去。

Let’s say a bright-eyed data scientist plotted John’s behavior over the year, he/she could have taken three binary variables, a = whether Jane was going to attend, b = whether the event was a party, and c = whether it was going to rain. It would be very simple to write an equation p = w1.a + w2.b + w3.c, and set a threshold to predict if John was going to that event or not. That is the simple neuron in data science that everyone seems so crazy about. With the right set of weights, it would predicted John’s behavior accurately.

假设某位眼光敏锐的数据科学家绘制了John在一年中的行为,他/她本可以采用三个二进制变量,a =简是否要参加,b =该事件是否为聚会,c =是否要参加下雨。 编写方程p = w1.a + w2.b + w3.c非常简单,并设置一个阈值以预测John是否要去参加该事件。 那是数据科学中的简单神经元,每个人似乎都为之疯狂。 使用正确的权重集,可以准确预测约翰的行为。

Let’s say Jane was also deciding based on weather forecast and the type of the event. Then there are two independent inputs, one hidden layer with a node for her decision (+ two to pass original inputs) and then one node for the final decision. How about whether John was going to wear his new jeans or not — so now we are talking about two nodes in the output layer. You can see how quickly it becomes a network of neurons.

假设Jane还在根据天气预报和事件类型做出决定。 然后有两个独立的输入,一个隐藏层,其中一个节点用于她的决策(+两个通过原始输入),然后一个节点用于最终决策。 约翰是否要穿新牛仔裤呢?所以现在我们要讨论输出层中的两个节点。 您可以看到它很快变成了神经元网络。

Image for post

The important thing is that we need to find the right set of weights. There are multiple algorithms to automatically detect these weights based on a given set of inputs and corresponding outputs. Something called Gradient Descent rules the roost.

重要的是,我们需要找到正确的权重集。 有多种算法可根据一组给定的输入和相应的输出自动检测这些权重。 叫“梯度下降”的东西统治着栖息地。

It turns out that neural networks can transparently replace most statistical classification algorithms. This is very powerful, because now you can focus on one technique for a wide variety of problems. We should be teaching neural networks in seventh grade instead of linear regression. With one hidden layer between input and output a neural network can also emulate any polynomial relationship given sufficient data. This is called Multi-Level-Perceptron-1 or MLP1.

事实证明,神经网络可以透明地取代大多数统计分类算法。 这非常强大,因为现在您可以专注于解决多种问题的一种技术。 我们应该教七年级的神经网络,而不是线性回归。 在输入和输出之间有一个隐藏层的情况下,神经网络还可以在给定足够数据的情况下模拟任何多项式关系。 这称为Multi-Level-Perceptron-1或MLP1。

深度学习 (Deep Learning)

Does anyone remember Newton and his iterative method of finding answers? For complex equations of the type x = f(x), with x on both sides, you would assume a value of x for the RHS, compute x on the LHS and then use that value for the RHS, and so on. You would continue till the difference between the values of x in subsequent iterations was near zero.

有人记得牛顿和他的迭代寻找答案的方法吗? 对于x = f(x)类型的复杂方程,在x的两边都带有x,您将假设RHS的值为x,在LHS上计算x,然后将该值用于RHS,依此类推。 您将继续操作,直到后续迭代中的x值之差接近零为止。

Same deal here — why do we have to decide directly on the inputs? We will find interim values, and then use those values to find the next set of interim values, and after doing that 100 times will we decide on the output. In other words, you are adding more and more layers of neurons between the input and the output layer. This is called a Deep Neural Network, and process of training it is called Deep Learning. It is very useful for non-linear classifications, like predicting whether a set of pixels represents a nose.

同样的事情-为什么我们必须直接决定输入? 我们将找到临时值,然后使用这些值来查找下一组临时值,并在执行100次之后决定输出。 换句话说,您正在输入和输出层之间添加越来越多的神经元层。 这称为深度神经网络,其训练过程称为深度学习。 这对于非线性分类非常有用,例如预测一组像素是否代表鼻子。

Image for post

复杂的AI模型 (Complex AI Models)

Here is the third big deal with AI, and it’s not that intuitive. To make any neural network work we must train it and get the right set of weights in the network. Turns out that the weights itself contain a lot of value.

这是人工智能的第三大难题,并不是那么直观。 为了使任何神经网络正常工作,我们必须对其进行训练并在网络中获得正确的权重。 事实证明,权重本身包含很多价值。

There is a very popular model in NLP called Word2Vec. It comes up with a set of numbers (a vector) for each word. Vectors for words with similar meaning will have numbers very close to each other. You can also do things like [King] — [Man] + [Woman] and get the vector for [Queen]. These vectors in fact are the weights from certain neural networks built for some task like predicting the next word.

NLP中有一个非常流行的模型,称为Word2Vec。 它为每个单词提供一组数字(一个向量)。 具有相似含义的单词的向量将具有彼此非常接近的数字。 您还可以执行[国王] — [男人] + [女人]之类的操作,并获取[女王]的向量。 实际上,这些向量是为某些任务(例如预测下一个单词)构建的某些神经网络的权重。

Once scientists figured out how the weights in neural networks carry so much value, they went crazy. Many of the most advanced models are a stack of neural networks where the weights are passed from one to another to get very sophisticated things done.

一旦科学家们弄清楚了神经网络中的权重如何具有如此高的价值,他们就疯了。 许多最先进的模型是一堆神经网络,其中权重从一个传递到另一个,以完成非常复杂的工作。

承诺 (The Promise)

The promise is insane. Now, as long as you have sufficient data you can teach a machine to program itself and learn most sophisticated, convoluted, non-linear relationships. The beauty is that you don’t have to understand those relationships yourselves, let alone articulate them. You can now afford to be completely ignorant. It’s not hard to imagine in the near future machines will be collecting all the data and making all the predictions, while humans will be focused on making smarter machines. Take any problem, select some [hyper-]parameters of a neural network, go to bed. Now, in fact, they have begun automating the process of selecting these hyper-parameters as well.

诺言是疯狂的。 现在,只要您有足够的数据,您就可以教机器进行编程并学习最复杂的,复杂的,非线性的关系。 美丽之处在于您不必自己理解那些关系,更不用说表达它们了。 您现在可以负担得起完全无知。 不难想象,在不久的将来,机器将收集所有数据并做出所有预测,而人类将专注于制造更智能的机器。 遇到任何问题,选择神经网络的某些[超]参数,然后上床睡觉。 现在,实际上,他们也已经开始自动选择这些超参数。

That is the promise. The reality? Coming up.

那是诺言。 现实? 接下来。

Next part in the series: ‘Why Doesn’t AI Work?

本系列的下一部分:' AI为什么不起作用? '

翻译自: https://medium.com/ai-in-plain-english/ai-and-data-science-for-dummies-chat-with-classmates-359e18dcc529

ai人工智能的数据服务

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值