2014年03月_macyang

转载 Is Apache Spark the Next Big Thing in Big Data?

In any article or blog post, any mention of Big Data usually includes something about Hadoop. When it comes to Big Data, Apache Hadoop has been the big elephant in the room, and the release of Hadoo

2014-03-29 16:05:19 1192

转载 Apache Hadoop 2.3.0 Released!

hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS.With this release, there are two significant enhancements to HD

2014-03-29 14:51:14 783

转载 Spark and Tez Highlight MapReduce Problems

On February 3rd, Cloudera announced support for Apache Spark as part of Cloudera Enterprise. I’ve blogged about Spark before so I won’t go into substantial detail here, but the short version is Sp

2014-03-27 14:15:15 854

转载 Apache Tajo Enters the SQL-on-Hadoop Space

The number of SQL options for Hadoop expanded substantially over the last 18 months. Most get a large amount of attention when announced, but a few slip under the radar. One of these low-flying option

2014-03-27 13:52:36 851

转载 hash join VS merge join

A "sort merge" join is performed by sorting the two data sets to be joined according to the join keys and then merging them together. The merge is very cheap, but the sort can be prohibitively expensi

2014-03-26 13:19:32 782

转载 Apache Tajo™ - An open source big data warehouse system in Hadoop

The main goal of Apache Tajo™ project is to build an advanced open source data warehouse system in Hadoop for processing web-scale data setsFeaturesInteractive and Batch QueriesFully dis

2014-03-26 11:58:02 1022

转载 Hadoop MapReduce: to Sort or Not to Sort

Tuesday Jan 22nd was a critical milestone for us at Syncsort as our main contribution to the Apache Hadoop project was committed. This contribution, patch MAPREDUCE-2454, introduced a new feature to

2014-03-20 11:39:27 935

原创 python学习笔记

what is difference between __init__ and __call__ in python?The first is used to initialise newly created object, and receives arguments used to do that:class foo: def __init__(self, a,

2014-03-18 21:54:47 1285

转载 Why do we need another Big data processing engine, like SPARK ?

Current ubiquitous standard for storing and processing very large data is Hadoop. It’s an open source Apache project with storage provided by HDFS (Hadoop Distributed File System) and processing by Ma

2014-03-07 14:17:08 959

转载 SAVING 9 GB OF RAM WITH PYTHON’S SLOTS

We’ve mentioned before how Oyster.com’s Python-based web servers cache huge amounts of static content in huge Python dicts (hash tables). Well, we recently saved over 2 GB in each of four 6 GB serve

2014-03-06 11:03:00 803

Mac Track