MapReduce programming model has simplified the implementations of many data parallel applications. The simplicity of the programming model and the quality of services provided by many implementations of MapReduce attract a lot of enthusiasm among parallel computing communities. From the years of experience in applying MapReduce programming model to various scientific applications we identified a set of extensions to the programming model and improvements to its architecture that will expand the applicability of MapReduce to more classes of applications. Twister is a lightweight MapReduce runtime we have developed by incorporating these enhancements.
Twisterprovides the following features to support MapReduce computations. (Twister is developed as part of Jaliya Ekanayake's Ph.D. research and is supported by the SALSATeam @ IU)
Distinction on static and variable data
Configurable long running (cacheable) map/reduce tasks
Pub/sub messaging based communication/data transfers
Efficient support for Iterative MapReduce computations (extremely faster thanHadoop or Dryad/DryadLINQ)
Combine phase to collect all reduce outputs
Data access via local disks
Lightweight (~5600 lines of Java code)
Support for typical MapReduce computations
Tools to manage data
New features in Twister v0.9:
Support new broker software ActiveMQ (see userguide)
Express Twister environment configuration (see userguide)
Automatically recover from faults when FaultTolerance is enabled (see userguide)
Partition File can be created inside the client code (see userguide)