最近想研究一下流处理模型,就顺便看看Apache S4
Apache S4填补了复杂的专有系统和面向批处理的开源计算平台之间的差距。我们的目标是开发高性能计算平台从应用编程的并行处理系统中固有的复杂性隐藏。
Apache S4 已经在 Yahoo 的系统中使用,用于处理每秒数以千计的搜索查询。
S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model , providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers.
待补充。。。