自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+

Mac Track

你有一个特别擅长的方向吗? 特别熟悉?有丰富的经验?

  • 博客(15)
  • 收藏
  • 关注

原创 Storm Fault tolerance

下面主要说明Storm在容错方面做的一些处理,虽说都是理论上的表述,但是可以在实际测试的过程中验证一下这些情况。1)What happens when a worker dies?When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable

2012-02-29 22:51:28 1799

转载 Beyond MapReduce:谈2011年风靡的数据流计算系统

2011年度的Hadoop China大会刚刚落下帷幕,这次会议的一个热点议题就是数据流计算,在MapReduce计算模型风靡全球之后,Stream Processing将会是下一个研究热点,无论是在工业界还是学术界。本文从深层次对各种典型的数据流计算系统架构及其基于的设计理念进行剖析。背景与动机背景随着当今社会数据量的日益膨胀,普通服务器组成的计算集群用于处理各种数据应用

2012-02-28 17:34:50 1146

原创 Mongodb Covered Indexes

MongoDB 1.8+ can return data from the index only when the query only involves keys which are present in the index. Not inspecting the actual documents can speed up responses considerably since the i

2012-02-19 21:15:05 833

转载 MongoSV Live-Blog: Performance Tuning and Scalability

This talk goes over various performance tuning techniques used in real world examples from our various implementations of MongoDB at Shutterfly. We will cover various techniques including usage of t

2012-02-19 20:25:20 667

转载 Join Optimization in Apache Hive

本文主要介绍facebook如何对hive join做优化,在做一个大表和小表关联的时候MapJoin特别有用,性能提高很多,推荐使用。With more than 500 million users sharing a billion pieces of content daily, Facebook stores a vast amount of data, and needs a s

2012-02-16 22:42:51 1171

转载 MapReduce:详解Shuffle过程

Shuffle过程是MapReduce的核心,也被称为奇迹发生的地方。要想理解MapReduce, Shuffle是必须要了解的。我看过很多相关的资料,但每次看完都云里雾里的绕着,很难理清大致的逻辑,反而越搅越混。前段时间在做MapReduce job 性能调优的工作,需要深入代码研究MapReduce的运行机制,这才对Shuffle探了个究竟。考虑到之前我在看相关资料而看不懂时很恼火,所以在这里

2012-02-16 22:34:05 1790 1

转载 Hive装载数据命令

必须在表定义时创建partitiona、单分区建表语句:create table day_table (id int, content string) partitioned by (dt string);单分区表,按天分区,在表结构中存在id,content,dt三列。以dt为文件夹区分b、 双分区建表语句:create table day_hour_table (id int, c

2012-02-14 21:28:31 4506

转载 Next Generation of Apache Hadoop MapReduce – The Scheduler

IntroductionThe previous post in this series covered the next generation of Apache Hadoop MapReduce in a broad sense, particularly its motivation, high-level architecture, goals, requirements, a

2012-02-09 22:32:18 730

转载 The Next Generation of Apache Hadoop MapReduce

OverviewIn the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users.The Ap

2012-02-09 22:29:34 1134

转载 MongoDB Best Practices

Hello from the Engine Yard Data Team! We wanted to let you know what we’ve been up to since the last time we blogged.When the team was formed earlier in the year, our first job was to expand our s

2012-02-08 09:38:57 2646

转载 Playing with huge information streams: Twitter Storm!

Past Christmas I found the perfect pet project for that season: Twitter Storm.Basically is a excellent piece of software that will allow you to process real time information in a ‘kind’ of map r

2012-02-06 21:24:13 1173

转载 Tenzing A SQL Implemention On The MapReduce Framework(译)

作者:Biswapesh Chattopadhyay&Weiran Liu .etc.Google Inc 2011-8原文:http://www.vldb.org/pvldb/vol4/p1318-chattopadhyay.pdf译者:phylips@bmy 2011-10-6译文: http://duanple.blog.163.com/blog/static/709717672

2012-02-05 18:42:36 2668 1

转载 Real Time Analytics for Big Data: An Alternative Approach

Lately, we've been talking to various clients about realtime analytics, and with convenient timing Todd Hoff wrote up how Facebook's realtime analytics system was designed and implemented (See previou

2012-02-04 18:15:35 1261

转载 The Comments Conundrum

霸气显露无疑!看例子了解Mongodb聚合框架One of the most common questions we get is:I have a collection of blog posts and each post has an array of comments. How do I get……all comments by a given author…t

2012-02-03 15:29:41 777

转载 Lower Lock % and Number of Slow Queries

Gauges tracks several websites. Some get a lot of traffic and others don’t. The sites that get a lot of traffic tend to stay hot and sit in RAM. The sites that get little traffic, eventually get pus

2012-02-01 23:08:19 667

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除