2013年12月_macyang

转载 Building Hadoop-based Apps on YARN

Apache Hadoop YARN changes the game for Hadoop applications, enabling a multi-application, multi-workload general purpose data operating system. YARN is:FlexibleStore data once and interact

2013-12-28 15:16:56 1060

转载 Modern Manufacturing Architectures Built with Hadoop

Dr. W. Edwards Deming was a statistician and manufacturing consultant who worked on Japanese reconstruction after WWII. His quality control methods influenced innovative Japanese manufacturing process

2013-12-28 15:16:18 1371

转载 Simplifying user-logs management and access in YARN

User logs of Hadoop jobs serve multiple purposes. First and foremost, they can be used to debug issues while running a MapReduce application – correctness problems with the application itself, race co

2013-12-28 15:02:21 1695

转载 Apache Hadoop YARN – NodeManager

The NodeManager (NM) is YARN’s per-node agent, and takes care of the individual compute nodes in a Hadoop cluster. This includes keeping up-to date with the ResourceManager (RM), overseeing containers

2013-12-28 12:37:51 948

转载 Apache Hadoop YARN – ResourceManager

As previously described, ResourceManager (RM) is the master that arbitrates all the available cluster resources and thus helps manage the distributed applications running on the YARN system. It works

2013-12-28 12:37:04 941

转载 Apache Hadoop YARN – Concepts & Applications

As previously described, YARN is essentially a system for managing distributed applications. It consists of a central ResourceManager, which arbitrates all available cluster resources, and a per-nod

2013-12-28 12:36:00 781

转载 Apache Hadoop YARN – Background and an Overview

Apache Hadoop YARN – Background & OverviewCelebrating the significant milestone that was Apache Hadoop YARN being promoted to a full-fledged sub-project of Apache Hadoop in the ASF we present th

2013-12-28 12:34:03 870

转载拥抱Spark，机遇无限——Spark Summit 2013精彩回顾

摘要：Spark Summit以Shark、Spark Streaming及相关项目为主题，汇聚了Yahoo、Adobe、Intel、Amazon、RedHat、Databricks等众多知名IT企业的一线专家。【编者按】Spark是发源于美国加州大学伯克利分校AMPLab的集群计算平台，立足于内存计算，从多迭代批量处理出发，兼收并蓄数据仓库、流处理和图计算等多种计算范式，是罕见的全能

2013-12-27 10:40:48 2108

转载 Managing Multiple Resources in Hadoop 2 with YARN

An overview of some of Cloudera’s contributions to YARN that help support management of multiple resources, from multi resource scheduling in the Fair Schedule to node-level enforcementAs Apache H

2013-12-25 23:17:00 957

转载 Hadoop 新 MapReduce 框架 Yarn 详解

对于业界的大数据存储及分布式处理系统来说，Hadoop 是耳熟能详的卓越开源分布式文件存储及处理框架，对于 Hadoop 框架的介绍在此不再累述，读者可参考 Hadoop 官方简介。使用和学习过老 Hadoop 框架（0.20.0 及之前版本）的同仁应该很熟悉如下的原 MapReduce 框架图：从上图中可以清楚的看出原 MapReduce 程序的流程及设计思路：

2013-12-22 21:36:21 770

转载深入理解Hadoop YARN中的Container概念

在学习Hadoop YARN—Hadoop 2.0新引入的通用资源管理系统过程中，总会遇到Container这一概念，由于中文资料的缺乏，很多人对Container这一概念仍非常的模糊。它与Linux Container是什么关系，它是否能像Linux Container那样为任务提供一个隔离环境?它代表计算资源，还是仅仅是一个任务处理进程？本文将尝试介绍Container这一概念。

2013-12-22 21:33:59 7531

转载 The Log: What every software engineer should know about real-time data's unifying abstraction

I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition

2013-12-19 13:08:32 2375

转载 Putting Spark to Use: Fast In-Memory Computing for Your Big Data Applications

Our thanks to Databricks, the company behind Apache Spark (incubating), for providing the guest post below. Cloudera and Databricks recently announced that Cloudera will distribute and support Spa

2013-12-15 17:43:58 1133

Mac Track