Spark01:Spark介绍:什么是Spark、Spark的特点、与Hadoop的比较、与Hadoop的结合

19 篇文章 0 订阅 ¥39.90 ¥99.00

一、什么是Spark

Spark是一个用于大规模数据处理的统一计算引擎。
注意:Spark不仅仅可以做类似于MapReduce的离线数据计算,还可以做实时数据计算,并且它还可以实现类似于Hive的SQL计算,等等,所以说它是一个统一的计算引擎。

既然说到了Spark,那就不得不提一下Spark里面最重要的一个特性:内存计算。
Spark中一个最重要的特性就是基于内存进行计算,从而让它的计算速度可以达到MapReduce的几十倍甚至上百倍。

所以说在这大家要知道,Spark是一个基于内存的计算引擎。

二、Spark的特点

接下来看一下Spark的一些特点

1、Speed:速度快

在这里插入图片描述
由于Spark是基于内存进行计算的,所以它的计算性能理论上可以比MapReduce快100倍。

Spark使用最先进的DAG调度器、查询优化器和物理执行引擎,实现了高性能的批处理和流处理。

注意:批处理其实就是离线计算,流处理就是实时计算,只是说法不一样罢了,意思是一样的。

2、Easy of Use:易用性

在这里插入图片描述
Spark的易用性主要体现在两个方面:

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Big Data, MapReduce, Hadoop, and Spark with Python: Master Big Data Analytics and Data Wrangling with MapReduce Fundamentals using Hadoop, Spark, and Python by LazyProgrammer English | 15 Aug 2016 | ASIN: B01KH9YWSY | 58 Pages | AZW3/MOBI/EPUB/PDF (conv) | 1.07 MB What’s the big deal with big data? It was recently reported in the Wall Street Journal that the government is collecting so much data on its citizens that they can’t even use it effectively. A few “unicorns” have popped up in the past decade or so, promising to help solve the big data problems that billion dollar corporations and the people running your country can’t. It goes without saying that programming with frameworks that can do big data processing is a highly-coveted skill. Machine learning and artificial intelligence algorithms, which have garnered increased attention (and fear-mongering) in recent years, mainly due to the rise of deep learning, are completely dependent on data to learn. The more data the algorithm learns from, the smarter it can become. The problem is, the amount of data we collect has outpaced gains in CPU performance. Therefore, scalable methods for processing data are needed. In the early 2000s, Google invented MapReduce, a framework to systematically and methodically process big data in a scalable way by distributing the work across multiple machines. Later, the technology was adopted into an open-source framework called Hadoop, and then Spark emerged as a new big data framework which addressed some problems with MapReduce. In this book we will cover all 3 - the fundamental MapReduce paradigm, how to program with Hadoop, and how to program with Spark. Advance your Career If Spark is a better version of MapReduce, why are we even talking about it? Good question! Corporations, being slow-moving entities, are often still using Hadoop due to historical reasons. Just search for “big data” and “Hadoop” on LinkedIn and you will see that there are a large number

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

做一个有趣的人Zz

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值