『 Spark 』1. spark 简介

最新推荐文章于 2021-11-26 16:56:32 发布

fengyuruhui123

最新推荐文章于 2021-11-26 16:56:32 发布

阅读量415

点赞数 1

分类专栏： spark 文章标签： spark scala

本文链接：https://blog.csdn.net/fengyuruhui123/article/details/78318823

版权

spark 专栏收录该内容

22 篇文章 2 订阅

订阅专栏

如何向别人介绍 spark

Apache Spark™ is a fast and general engine for large-scale data processing.

Apache Spark is a fast and general-purpose cluster computing system.
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
It also supports a rich set of higher-level tools including :

Spark SQL for SQL and structured data processing, extends to DataFrames and DataSets
MLlib for machine learning
GraphX for graph processing
Spark Streaming for stream data processing

spark 诞生的一些背景

这里写图片描述

Spark started in 2009, open sourced 2010, unlike the various specialized systems[hadoop, storm], Spark’s goal was to :

generalize MapReduce to support new apps within same engine
- it’s perfectly compatible with hadoop, can run on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
speed up iteration computing over hadoop.
- use memory + disk instead of disk as data storage medium
- design a new programming modal, RDD, which make the data processing more graceful.
  [RDD transformation, action, distributed jobs, stages and tasks]

这里写图片描述

为何选用 spark

designed, implemented and used as libs, instead of specialized systems;
- much more useful and maintainable

这里写图片描述

from history, it is designed and improved upon hadoop and storm, it has perfect genes;
documents, community, products and trends;
it provides sql, dataframes, datasets, machine learning lib, graph computing lib and activitily growth 3-party lib, easy to use, cover lots of use cases in lots field;
it provides ad-hoc exploring, which boost your data exploring and pre-processing and help you build your data ETL, processing job;

参考文章

Intro to Apache Spark

introducing spark

fengyuruhui123

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
『 Spark 』1. spark 简介

如何向别人介绍 sparkApache Spark™ is a fast and general engine for large-scale data processing.Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala
复制链接

扫一扫