分清big data,ML,AI之间的关系

原创 2016年06月01日 11:46:15

How are big data and machine learning related?(大数据与机器学习间关系)

Big data and machine learning are not related, but when used together can do real wonder. (没有直接联系,但是在一起效果更好)

Machine Learning & Big Data: The learning comes from extensive calculations done over existing datasets to create a learning model(in most cases). A normal system can’t handle very large dataset calculation and data size is increasing day by day, thus the obtained model should be adapted accordingly. To obtain this we have to implement distributed computing using big data technologies like Apache Mahout, Spark, R-Hadoop or initial analytics processing in projects like hive/ pig and feed output to machine learning algorithms for model/ learning generation.(机器学习需要对已经存储的数据集进行广泛计算进而产生学习模型。但是常规的系统不能处理大量的数据集,并且数据大小与日俱增,随着时间推移,已经得到的模型需要进行更新。为了达成这个目标,我们需要用分布式计算,利用大数据的技术,来产生模型和机器学习算法。)

You can apply machine learning algorithms to big data and/or you can apply big data processing techniques to machine learning.(两种技术可以相互渗透)

An example of the first case would be training a neural network or logistic regression with a large dataset using online gradient descent.(在大数据集上用在线梯度下降来训练神经网络或逻辑回归)

An example of the second case would be parallelizing gradient descent to run in a Map-Reduce environment.(在Map-Reduce环境下执行并行梯度下降)

In Machine learning large datasets usually mean you need to use simpler algorithms and they perform much better than on smaller datasets.

There are two types of insights anyone can get from a dataset :
Q1. Direct (group by/join/ sum/ max / average)(直接)
Q2. Inductive (if something is.. then something else is.. else anything is..)(推测)

Mind that the first type of insights are always exact, so you need to use computational tools like excel in small data and hadoop in big data to calculate.
The inductive insights on the other hand are approximations on seeing the data. For small amount of Data, a human can try and infer things seeing charts/graphs etc. However, when the data is huge, its beyond human capacity to infer rules from data. This is exactly when Machine Learning comes in.

One of the biggest reason’s why we use big data is to extract some meaning out of it, so that we can make better decisions. And that’s what machine learning does! It is the science of training systems to learn from data and output appropriate response without being explicitly programmed for that .But, on flip side without big data machine learning would be totally irrelevant, because to learn anything from data you need to have a large number of ‘training examples’ so that all possible scenarios are exhausted and also to avoid faulty training due to few erroneous datasets.
So, they are deeply interconnected.(一句话,大数据集让机器学习出来的模型不偏颇)

I have often found these terms used in an interchangeable way, which is totally wrong.
Big data has got more to do with High Performance Computing(大数据跟高性能计算相关), while Machine Learning is a part of Data Science(机器学习是数据科学的一部分). What happens in Big Data is large volumes of data which cannot be processed in reasonable amount of time, is processed quickly by various techniques and tools. In Machine Learning, a system learns from past experiences and is able to build a model which would most likely be able to comprehend future instances.
One of the main reason why big data and machine learning are used together is because big data is more likely to be a preprocessing step to machine learning.

Machine Learning is a science of studying patterns in the data. These patterns explain how the data is correlated. This correlated data is used to make future predictions.

Big Data is an art of working with large amount of data. As such, machine learning could be done on a smaller set of data, but larger the data; better the predictions.

So if I were to give a short answer; When you have a lot of structured/unstructured data that you want to study and find patterns, then you use big data and run your Machine Learning algorithms and find patterns that make a business use case.

Machine Learning - Build models. When people hear the term “machine learning”, they make mental images of robots who walk, climb or clean houses. In reality, machine learning starts alot closer to home. When you open your emails, spam has been filtered out from your important messages by an algorithm that has learnt to classify “spam” and “not spam”. Your Facebook news feed features posts from your closest friends because an algorithm has examined your likes, tags and photos to decipher who you connect with most. When you upload a photo and the website identifies your face, it’s fuelled by a facial recognition algorithm. When you use a search engine, you see the best and most relevant content first because of a sophisticated search ranking algorithm. In short, machine learning permeates our lives i.e it builds models for self learning algorithms.
Data Mining - It is an analytic process designed to explore data and consequently find Patterns in data. It is a practice of applying algorithms (mostly Machine learning algorithms ) to find patterns in data.
Artificial Intelligence - Behaves and Reasons. Science to develop a system or software to mimic human to respond and behave in a circumference. As field with extremely broad scope, AI has defined its goal into multiple chunks. Later each chuck has become a separate field of study to solve its problem.
Major list of AI goal :-
Knowledge Representation
Computer Vision
Machine Leaning
Natural Language
General intelligence, or strong AI
Machine learning is field emerged from one the AI goal to help machine to learn on it own to solve problems it’s can come across.

Natural language processing is another such field emerged from AI goal to help machine to communicate with real human.

Computer vision is a field emerged from AI goal to identify and distinguish objects that the machine could see.

Robotics is a field emerged from AI goal to give a physical appearance for a machine to do physical actions.




1.一个故事说明什么是机器学习 2.机器学习的定义 4.机器学习的方法 5.机器学习的应用–大数据 6.机器学习的子类–深度学习 7.机器学习的父类–人工智能 ...

当 IDENTITY_INSERT 设置为 OFF 时,不能为表 'tb_MyInvoices' 中的标识列插入显

默认情况下,IDENTITY_INSER就是off 这种情况下,你写insert 语句时,identity栏位,不要写值,系统会自动帮你写入。 举例说明: create table #aa(id i...
  • yy3097
  • yy3097
  • 2016年10月12日 15:01
  • 3216

当 IDENTITY_INSERT 设置为 OFF 时,不能为表中的标识列插入显式值

当 IDENTITY_INSERT 设置为 OFF 时,不能为表中的标识列插入显式值 {"当 IDENTITY_INSERT 设置为 OFF 时,不能向表 'OrderList' 中的标识列插入...

【Energy Big Data】能源互联网和电力大数据


Big Data真的需要定制的硬件吗?

Does big data really need custom hardware? By Stacey Higginbotham From:http://gigaom.com/data...

参加2012 DTCC大会,总结Big data的趋势

参加了2012中国数据库技术大会,综合各方的演讲、资料和个人理解,总结出Big data(大数据)的趋势。 本次技术大会议题众多,但无论是企业级应用还是互联网应用,关注的焦点无一不是“大数据”。...

论big data 3.0取代SAP HANA的可行性

论big data 3.0取代SAP HANA的可行性 简言之,big data 3.0就是要实现SQL on big data,而且要兼顾性能、易用性和可扩展性。目前是“搜索引擎+大数据+SQL“...

Big Data 應用:第二季(4~6月)台湾地区Game APP 变动分布趋势图

图表简介:         该示意图表示了台湾地区第二季内所有Game APP类别的分布情形,经由该图表我们可以快速的了解到在这三个月内,哪类型的APP是很稳定;抑或者哪类型的APP是非常不稳定的。...

Road of Big Data(2)----Linux简单设置和远程工具配置及使用

工欲善其事,必先利其器。 本篇主要记录的是Linux的一些简单配置,及四个远程工具的使用。...
您举报文章:分清big data,ML,AI之间的关系