spark
文章平均质量分 80
macyang
Chance is waiting for prepared people and my Status is read the fucking source code.
展开
-
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
1) Goals and OverviewOur goal is to provide an abstraction that supports applications with working sets (i.e., applications that reuse an intermediate result in multiple parallel operations) while p原创 2011-12-18 23:53:27 · 2508 阅读 · 0 评论 -
Why do we need another Big data processing engine, like SPARK ?
Current ubiquitous standard for storing and processing very large data is Hadoop. It’s an open source Apache project with storage provided by HDFS (Hadoop Distributed File System) and processing by Ma转载 2014-03-07 14:17:08 · 958 阅读 · 0 评论 -
How does Impala compare to Shark?
Disclaimer: I lead the Shark development effort at UC Berkeley AMPLab. For more information on Shark, see Lightning Fast Data Warehouse SystemShark extends Apache Hive to dramatically speed up b转载 2014-02-28 11:23:44 · 855 阅读 · 0 评论 -
Spark Cluster Mode Overview
ComponentsSpark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called thedriver program). Specifically, to run on a cl转载 2014-02-16 16:30:12 · 1113 阅读 · 0 评论 -
Spark Streaming:大规模流式数据处理的新贵
提到Spark Streaming,我们不得不说一下BDAS(Berkeley Data Analytics Stack),这个伯克利大学提出的关于数据分析的软件栈。从它的视角来看,目前的大数据处理可以分为如以下三个类型。 复杂的批量数据处理(batch data processing),通常的时间跨度在数十分钟到数小时之间。基于历史数据的交互式查询(interactive query转载 2014-02-11 17:58:43 · 1236 阅读 · 0 评论 -
Data-Intensive Systems:Real-time Stream Processing
Spark StreamingSpark Streaming is an interesting extension to Spark that adds support for continuous stream processing to Spark. Spark Streaming is in active development at UC Berkeley's amplab al转载 2014-01-16 13:22:26 · 2311 阅读 · 0 评论 -
Apache Spark for Big Analytics
The emergence of Apache Spark is a key development for Big Analytics in 2013. Spark, an Apache incubator project, is an open source distributed computing framework for advanced analytics in Hadoop转载 2014-01-15 22:40:20 · 1217 阅读 · 0 评论 -
Spark:大数据的“电光石火”
park已正式申请加入Apache孵化器,从灵机一闪的实验室“电火花”成长为大数据技术平台中异军突起的新锐。本文主要讲述Spark的设计思想。Spark如其名,展现了大数据不常见的“电光石火”。具体特点概括为“轻、快、灵和巧”。轻:Spark 0.6核心代码有2万行,Hadoop 1.0为9万行,2.0为22万行。一方面,感谢Scala语言的简洁和丰富表达力;另一方面,Spark很好地转载 2014-01-15 22:39:27 · 1630 阅读 · 0 评论 -
拥抱Spark,机遇无限——Spark Summit 2013精彩回顾
摘要:Spark Summit以Shark、Spark Streaming及相关项目为主题,汇聚了Yahoo、Adobe、Intel、Amazon、RedHat、Databricks等众多知名IT企业的一线专家。【编者按】Spark是发源于美国加州大学伯克利分校AMPLab的集群计算平台,立足于内存计算,从多迭代批量处理出发,兼收并蓄数据仓库、流处理和图计算等多种计算范式,是罕见的全能转载 2013-12-27 10:40:48 · 2095 阅读 · 0 评论 -
Impala and Shark Benchmark
Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the exist转载 2014-01-20 22:24:03 · 1230 阅读 · 0 评论 -
Spark快速入门指南
- Spark是什么?Spark is a MapReduce-like cluster computing framework designed to support low-latency iterative jobs and interactive use from an interpreter. It is written in Scala, a high-level la原创 2011-12-24 23:34:48 · 20729 阅读 · 4 评论 -
Spark Programming Guide
本文主要来自两部分: Part#1: https://github.com/mesos/spark/wiki/Spark-Programming-Guide, Part#2: http://www.spark-project.org/examples.html通过Part#1很容易了解Spark提供了哪些东西,而Part#2让你快速知道如果使用Spark编写程序。At a high转载 2011-12-16 00:06:57 · 4067 阅读 · 0 评论 -
Is Apache Spark the Next Big Thing in Big Data?
In any article or blog post, any mention of Big Data usually includes something about Hadoop. When it comes to Big Data, Apache Hadoop has been the big elephant in the room, and the release of Hadoo转载 2014-03-29 16:05:19 · 1190 阅读 · 0 评论