- 博客(10)
- 收藏
- 关注
转载 Is Apache Spark the Next Big Thing in Big Data?
In any article or blog post, any mention of Big Data usually includes something about Hadoop. When it comes to Big Data, Apache Hadoop has been the big elephant in the room, and the release of Hadoo
2014-03-29 16:05:19 1192
转载 Apache Hadoop 2.3.0 Released!
hadoop-2.3.0 is the first release for the year 2014, and brings a number of enhancements to the core platform, in particular to HDFS.With this release, there are two significant enhancements to HD
2014-03-29 14:51:14 783
转载 Spark and Tez Highlight MapReduce Problems
On February 3rd, Cloudera announced support for Apache Spark as part of Cloudera Enterprise. I’ve blogged about Spark before so I won’t go into substantial detail here, but the short version is Sp
2014-03-27 14:15:15 854
转载 Apache Tajo Enters the SQL-on-Hadoop Space
The number of SQL options for Hadoop expanded substantially over the last 18 months. Most get a large amount of attention when announced, but a few slip under the radar. One of these low-flying option
2014-03-27 13:52:36 851
转载 hash join VS merge join
A "sort merge" join is performed by sorting the two data sets to be joined according to the join keys and then merging them together. The merge is very cheap, but the sort can be prohibitively expensi
2014-03-26 13:19:32 782
转载 Apache Tajo™ - An open source big data warehouse system in Hadoop
The main goal of Apache Tajo™ project is to build an advanced open source data warehouse system in Hadoop for processing web-scale data setsFeaturesInteractive and Batch QueriesFully dis
2014-03-26 11:58:02 1022
转载 Hadoop MapReduce: to Sort or Not to Sort
Tuesday Jan 22nd was a critical milestone for us at Syncsort as our main contribution to the Apache Hadoop project was committed. This contribution, patch MAPREDUCE-2454, introduced a new feature to
2014-03-20 11:39:27 936
原创 python学习笔记
what is difference between __init__ and __call__ in python?The first is used to initialise newly created object, and receives arguments used to do that:class foo: def __init__(self, a,
2014-03-18 21:54:47 1286
转载 Why do we need another Big data processing engine, like SPARK ?
Current ubiquitous standard for storing and processing very large data is Hadoop. It’s an open source Apache project with storage provided by HDFS (Hadoop Distributed File System) and processing by Ma
2014-03-07 14:17:08 960
转载 SAVING 9 GB OF RAM WITH PYTHON’S __SLOTS__
We’ve mentioned before how Oyster.com’s Python-based web servers cache huge amounts of static content in huge Python dicts (hash tables). Well, we recently saved over 2 GB in each of four 6 GB serve
2014-03-06 11:03:00 805
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人