- 博客(16)
- 收藏
- 关注
转载 The Secret To 10 Million Concurrent Connections -The Kernel Is The Problem, Not The Solution
Now that we have the C10K concurrent connection problem licked, how do we level up and support 10 million concurrent connections? Impossible you say. Nope, systems right now are delivering 10 millio
2013-06-29 20:53:18 1199
转载 Bitmap Index vs. B-tree Index: Which and When?
Understanding the proper application of each index can have a big impact on performance.Published 2005Conventional wisdom holds that bitmap indexes are most appropriate for columns having low
2013-06-29 10:14:15 1130
转载 Indexing Strategies for Optimizing Queries on MySQL
MySQL’s index begins by reviewing how indexes work, as well as their structure. Next, it reviews indexing features specific to each of the major MySQL data storage engines. This article then examines
2013-06-29 09:49:30 842
转载 TAO: The power of the graph
Facebook puts an extremely demanding workload on its data backend. Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented
2013-06-27 22:04:11 1056
原创 lost task tracker issue in CDH 4.1.2
今天帮助一个同学解决job运行时间过长的问题, task 被kill后的 error信息是: “Lost task tracker: tracker_xxxxxx”, 从job history可以看到“Stage-2 map = 100%, reduce = 100%” 打印了很长时间,所以怀疑是dump文件的时间过长, 然后查看代码发现她的sql中存在两个大表的join操作(9亿+ * 97
2013-06-27 13:54:57 1790
转载 Improvements in the Hadoop YARN Fair Scheduler
Starting in CDH 4.2, YARN/MapReduce 2 (MR2) includes an even more powerful Fair Scheduler. In addition to doing nearly all that it could do in MapReduce 1 (MR1), the YARN Fair Scheduler can schedule n
2013-06-24 22:26:34 1088
转载 Wormhole pub/sub system: Moving data through space and time
Over the last couple of years, we have built and deployed a reliable publish-subscribe system called Wormhole. Wormhole has become a critical part of Facebook's software infrastructure. At a high leve
2013-06-17 19:23:45 1457
转载 Moving Hadoop Beyond Batch with Apache YARN
Apache Hadoop 2.0 continues to make its way through the open source community process at the Apache Software Foundation and is getting closer to being declared “ready” from a community development per
2013-06-16 21:07:29 1041
转载 Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing
At Yahoo!, Hadoop plays a central role in providing personalized experiences for our users and creating value for our advertisers. To serve Yahoo!’s emerging business needs, the Cloud Engineering Grou
2013-06-16 20:56:11 1319
转载 28msec - query data from any source in real time
Derrick Harris writing about 28msec, still-in-stealth-mode, generic query language:Their solution was to create a platform able to extract data from any of these sources, transform it into a sta
2013-06-14 17:33:18 837
转载 Hadoop and the EDW
Rob Klopp summarizes a whitepaper published by Cloudera and Teradata:Simply put, Hadoop becomes the staging area for “raw data streams” while the EDW stores data from “operational systems”. Ha
2013-06-14 17:14:10 904
转载 Optimizing Joins running on HDInsight Hive on Azure at GFS
IntroductionTo analyze hardware utilization within their data centers, Microsoft’s Online Services Division – Global Foundation Services (GFS) is working with Hadoop / Hive via HDInsight on Azure.
2013-06-14 17:04:24 1076
转载 Migration to the New Metrics Hotness – Metrics2
IntroductionHBase is a distributed big data store modeled after Google’s Bigtable paper. As with all distributed systems, knowing what’s happening at a given time can help spot problems before
2013-06-14 16:55:12 802
转载 Storing Big Data With Hive: RCFile
Are sequence files or RCFile (Record Columnar File) the best way to store big data in Hive?There are reasons to use text on the periphery of an ETL process, as the previous post discussed. (See: S
2013-06-14 16:50:50 1357
转载 Introduction to HBase Mean Time to Recover (MTTR)
HBase is an always-available service and remains available in the face of machine failures and rack failures. Machines in the cluster runs RegionServer daemons. When a RegionServer crashes or the mach
2013-06-14 16:41:12 1053
转载 HBase - Who needs a Master?
At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the cas
2013-06-14 16:10:55 831
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人