MapReduce On Stream Processing

//需要看图的m我

MapReduce On Stream Processing

 


Tao Chen, DongDai, Yuchao Huang, Xuehai Zhou

University of Scienceand Technology of China


ctaotao@mail.ustc.edu.cn, daidong@mail.ustc.edu.cn, hyc@mail.ustc.edu.cn,xhzhou@ustc.edu.cn


 

 


Abstract—MapReduceproposes an architecture and a pro-gramming model for data-intensivedistributed computing of batch jobs. MapReduce makes it possible to run a batchjob without worrying about distributed system issues. However, the jobs inMapReduce is batch-oriented,  and it doesnot support stream processing. In this paper, we present Mos (MapReduce OnStream Processing), an architecture for processing stream-data in clusters ofcommodity computers using MapReduce programming model. Compared with Map-Reduce,Mos aims to process stream-orient data, and the architectures have manydifferences. Mos has many own mechanisms, suchas resource monitor, standby tasks and etc., for ensuring the correctness of stream processing.Designing Mos raises several important issues. This paper describes theimplementation, critical issues and their solutions in Mos.

Keywords-cloud computing; MapReduce; streamprocessing

                                                                                                                                                               I.          Introduction

Recently, there is a large class ofemerging applications in which data, generated in some external environment, ispushed asynchronously to servers that process this information. Theapplications (such as sensor networks, real-time traffic information service,advertisement recommen-dation system and etc.) are characterized by the need toprocess stream data in a timely fashion. This brings a new challenge of dataprocessing.

It is worth mentioning thatsome researchers adopt a tradeoff approachfor the challenge above. They cached the stream-data in disks, and starts a background MapReducetask to processes those cached data when the size of data reaches a certainamount. Then they focus on how to improve theprocessing speed of the MapReduce job. But facing manystream-based scenarios, this approach brings timeliness issuesinevitably. In thiscase, it is a clear need for stream processingsolutions that can fit for the situation, so we design Mos, an architecturewhich couldefficiently supports stream-data processing. Streamprocessing is not a new problem, it has been researched for years, and weaccept many achievements in the previous researches [1, 4, 5, 6, 7].

This paper outlines the implementation ofMos facing the design of stream processing system. In Section 2 we begin with abrief review of MapReduce, and then we describe the implementation of Mos indetail and the differences between MapReduce and Mos. In Section 3 we describecritical questions and give the corresponding solutions. In Section 4 we givethe conclusions and future work.

                                                                                                                                                  II.        Mos System Overview

A.    Simply Overview on MapReduce

MapReduce [2] makes it possible to easilyrun a batch job without worrying about distributed system issues. Itsarchitecture consists of a single master and many worker nodes that belong tothe same administrative domain and cooperate to run jobs. Work nodes use aheartbeat protocol to communicate with the master node. The master node getsthe information of worker nodes and manages the cluster. In this section, wereview the features of MapReduce.

1)      Programming Model: Theinput of MapReduce is a collection of key/value pairs. To use MapReduce,developers need implement two interfaces: Map and Reduce. Each job has two mainphases: first, MapReduce system use a user-defined map function to process theraw input and produce a list of intermediate key/value pairs. Next, the systemuses a user-defined reduce function to process the intermediate result andcomplete the job. The architecture automatically parallelizes the job andensures fault tolerance. In the first phase a combiner function allows to bedefined by developers to reduce the amount of transmitted data between nodes.

2)      Storage System: MapReducejobs work with a distributed file system (GFS [3]). GFS is optimized forbatch-oriented jobs. In Most of the jobs, GFS is not only used to store data forinput in map phase, but also used to store results for output in reduce phase.GFS can provide high throughputs for MapReduce jobs.

B.   Implementation of Mos

Mos is similar to MapReduce. Thedifferences between Mos and MapReduce are determined by thecharacteristics of stream-data. For example, Mos does not support sorting databefore reduce phase. The data come into the system randomly, and the system isdifficult to choose an occasion to sort the input data. For this reason, thisfeature does not be supported by Mos. In this section, we first introduce the Mosarchitecture, then discuss the differences between Mos and MapReduce.

1)      Roles in Mos: We demonstrate Mosarchitecture with figure 1. Participants are divided into three kinds of rolesin Mos: master node, task node and input node. Each node represents a physicalcomputer.



 

Fig.1Mos architecture: master node (left) and task node (right)


2)      Master node: There is onlyone master in Mos cluster. Therefore, all the other nodes need to communicatewith master node periodically. The responsibilities of master node arescheduling tasks, monitoring and recovering the faults, balancing workload andetc. It consists of three functional components: Global Resource Manager, GlobalMonitor and Scheduler. Global Resource Manager records the used and unused resources inthe cluster. The responsibilities of Global Monitor arecollecting information from other nodes included errors, bottlenecks and etc.,and telling Scheduler how and when to schedule tasks. Scheduler is the heart ofthis system, and it determines the efficiency of jobs in the cluster.

3)      Task node: Tasknodes run tasks in Mos, They ask for tasks from master node by RPC mode.  A Tasknode is composed by three components: ResourceManager,  Bottleneck Monitor and TaskRunners. Resource Manager uploads the utilization of local resources to masternode, allocates new tasks and cancels no longer used tasks. Bottleneck Monitormonitors all possible resource bottlenecks. When a kind of resource becomes abottleneck, Bottleneck Monitor will choose a task to transfer to other nodes.Task Runners are the actual places where the user-defined functions run, theyare controlled and managed by Resource Manager.

4)      Input Node: Strictly speaking, Input node is nota member of Mos, because it is out of the Mos cluster. Input node is the sourceof data. It produces data and sends data to the cluster. Input node consists ofclient program (not map function or reduce function, it is only used to producedata) and Mos API. Client programs produce data by itself or other ways, thenuse Mos API to launch the data stream to task node.

C.   Execution Procedure

There are a lot of differences between Mosand Map-Reduce. In this section, we discuss the execution procedure of a job inMos:

·        A Client submits a Job with itsfiles to master, then master node deploys this job in the cluster and returnsthe job's ID.

·        Client tells every input nodethe job's ID, then Input nodes use the ID to request output addresses.

·        Input node calls the API of Mosto transfer data to map task node.

·        Map task nodes processthe data, and then launch the intermediate result to reduce task nodes.

·        Reduce task node processes theintermediate data given from map task node, then output the result.

·         Back to step 3 unless the job finished.

D.   Differences between Mos andMapReduce

Mos is a stream processing engine, and itis chara-cterized by low latency. Therefore, it has its own features, which aredifferent from MapReduce. Figure 2 describes the differences.

1)      Data Transfer Mode. MapReduce usesa pull-based mode to transfer data, and this is optimal for batch processing.This mode needs a storage operation and a read operation. To achieve lowlatency, MROSL changes themode, it uses a push-based mode without a costly storage operation in thecritical processing path. When map task node finished processing, it directlypushes the data to reduce task node instead of waiting for being fetched. Thismode keeps data moving, and avoids the waste of unnecessary time, so MROSL could minimizes the processing latency.

2)      Task Running Mode. There is asynchronized barrier between map phases and reduce phases in MapReduce. It isnot suitable for streaming data processing. Input data in Mos are stream-basedand in MapReduce are batch-based, so Mos system cannot process data like theprocessing mode of MapReduce. When the data arrive, Mos system will processthese data immediately without caching them. In a word, Mos architectureremoves the synchronized barrier and keeps data moving. For this reason, thelatency becomes minimal.

3)      Sorting Phase. There is a lot of wastingtime between map phases and reduce phases in MapReduce. To use the wastingtime, when data arrives reduce task nodes, nodes sorts the data beforeprocessing them. The sorting stage is useful for many actual jobs, but it isnot useful for stream- based data processing jobs. Because stream-data areunpredicted. System cannot know when to run this operation.

 

 

Fig.2Mos dataflow for stream-data processing (left) and batch processing ofMapReduce (right).

 

MapReduce storethe data during data processing and Mos violates the principle and keeps datamoving. For the reasons above, we can see, Mos abandons the sorting phase.

4)      Lifetime of a Task: The generationand  disappearance of MapReduce tasks aredetermined by the stage of a job, and each task has fixed input data. The taskcannot disappear because the stream-based data may be infinite, so the taskmust keep in memory and wait for the coming of data. The generation anddisappearance of a task will be scheduled by master node. Master node needs to adjust the executions of tasks to ensurethe load balance and availability in Mos cluster.

                                                                                                                                             III.       Problems and Solutions

In this section, we discuss the problemsin designing Mos. For simplicity, we only describe the most two importantissues: dynamical load management and fault tolerance.

A.   Dynamical load Management

Load balancing has a great influence onprocessing latency. Unpredictable input rates must be dynamically adjust theallocation of processing among the nodes and it is the Most important issue in Mos.Our goal is that each node will not become a bottleneck in thecluster so the average processing delay will be minimal. We assume nodesin the cluster are homogeneous.

1)       Resource Definition: We abstract the Mos sys-tem resources into a resourcevector R. For each critical resources and workloadin the cluster, such as CPU utilization, memory utilization, IO, etc., weconsider them as a component of the vector R,thus we define R = (e.g., CPU, memory..., IO). Each tasknode has a vector B, which definition likes vector R, toindicate the use of resources. Master node gathers the information of othernodes and feedback to the global task scheduler.

2)       Optimization Goal: Our ultimate goal of load management is to minimize the average dataprocessing latency. The maximum processing delay happens in the task node. Wedefine:

processing latency = data transmissiontime + waiting time + processing time.

In a push-based stream processing system,data trans-mission time is minimized, so we can ignore the factor. Then wefocus on the other two factors: waiting time and processing time. These twofactors are entirely determined by the machines' capacity. In other words,minimizing average latency can be achieved by minimizing load variances.

3)      Mechanism: Partitioning singletask. When a job isdeployed, there is only one map task and one reduce task per job in thecluster. Mos must constantly monitor the resources in the cluster. When theinput data increases, there will be a large amount of data in memory, which lack of timely processing. This will lead to increase waiting time,thus processing delay will be longer. To optimize processing latency, we decideto split a task into two new tasks. This mechanism has a critical problem, whenand how to split a task.

The splitting occasion in this strategyis very important. For each task, we set a buffer size. When buffer usagereaches a threshold, we will split this task (Figure 3). This function is doneby local monitor.

To split a task efficiently, we use aheuristic approach. We assume that the resource vector Xi of node i equals(Xi1, Xi2…, Xim),the resource threshold is Ri= (R1, R2..., Rm).If one of the resources in node i(such as Xij) satisfiesthe formula, Xij - Rj < 0, then we can saythe node becomes a bottleneck, the scheduler need transfer one task in the nodeto another node.

Another question is how to determinewhich task needs to be scheduled. For each task in the cluster, we also definea resource vector T. When Node i becomes a bottleneck and Resource r is the critical resource which makesNode i becomes a bottleneck, then thelocal monitor will choose the task which occupies this kind of resource the Mostin Node i.

We have got the task which needs to bescheduled previously, and then we need an object to transfer this task. Next,we will discuss how to choose an objective node.

Before we show the algorithm, we define:

·        If Task Tkis schedulable in Node i, then Rij - Xij- Tkj> 0 (j = 1, 2..., m).

Then, we give the scheduling algorithm.

4)      Scheduling algorithm.

Input: Xi (workload vectors in the cluster)

Output: the allocated node

1: EX← (X1 + X2,...+ Xn)/n

2: allocate← 0

3: workload←Double.MAX_VALUE

4: for each task node in the cluster

5:   if Task k is schedulable inNode i

 

 

 

Figure3. Splitting a task

 

6:      if workload > ( Xi + Tk - EX )2

7:        allocatei

8:        workload ← ( Xi + Tk - EX )2

9:      end if

10:   end if

11: end for

12: if allocate != 0

13:   return allocate

14: else

15     return no node can schedule

16: end if

The final purpose of this algorithm is toaverage the resource utilization in the cluster, it takes O(n) time to find theobjective node to schedule a task. The time complexity of algorithm is suitablefor a scalable, distributed cluster.

B.   Fault Tolerance Overview

Mos is a high available stream processingengine. We need fault recoveries for processing faults. Because ofits characteristics of streaming data, fault tolerance can be handledusing the methods for stream-data, thus we extract the thinking of [7], andthen we propose a new mechanism for Mos.

1)       Job Classification: In Mos, how to recovery fault is determined by thejob attributes. Mos divides jobs into threecategories based on the sensitivity to faults and timeliness:  the jobs which do not need to recovery; thejobs which need to recovery with low timeliness requirements; the jobs whichneed to recovery with high timeliness requirements.

2)       Standby Task: There is a new kind of task in the cluster, named standby tasks. Theyhave corresponding entity tasks .They run in the cluster without output and thecorresponding entity task output data. When an entity task fails, the masternode makes its standby task become entity task and lets this standby task outputdata.

3)       Implementations of Fault Tolerance: The master node pingsevery task node periodically. If no response is received from a task node in acertain amount of time, the master node thinks the task node is failed, so the uncom-pleted work of the failed node need to be done by other tasknodes. Global monitor detects the status and schedules the correspondingstandby task to continue the work.

A storage operation adds a great deal ofunnecessary latency to the process, so Mos transfers data from source node todestination node directly. This feature needs a new mechanism tokeep the availability of the system. Thus we propose two approaches to keep theaccuracy:

·        Redundant storage means the standby taskcaches the input data in memory but it does not process. Itbrings 100% storage load. The entity task has a responsibility to send messagesto tell the standby task which data can be discarded. The frequency of the abovebehavior can be adjusted by developers.

·        Redundant computing brings 100% computingload, the standby task run the program like the entity task except outputting data.When the entity task fails, the corresponding standby task will be scheduled byglobal monitor. Because the standby task does the same thingsas the entity task, so the time of fault-tolerant is minimal. This method issuitable for the tasks which have high timeliness requirements.

As can be seen from above, in order toimprove the availability and maintain the characteristic of timely fashion, Mosneeds to sacrifice some performance in the cluster. In another word, lowoverhead can save resources but adds latency and reduces the availability. In this case, Developers should choose a right mode to run their application.

                                                                                                                                    IV.       Conclusions and Future Work

This paper demonstrates the criticalproblems of stream processing and proposes an architecture named Mos, thendiscussed the implementation from three aspects: Overall architecture, loadbalance and fault-tolerance. The new mechanisms of load balance, fault-toleranceand etc. make Mos possible to support stream processing and support Map-Reduceprogramming model. Like MapReduce architecture, Developers can runstream-oriented jobs using MapReduce programming model on Mos without  worrying about distri-buted system issues.

In considering future work, anotherquestion for stream processing is evaluation criteria. Because of theuncertainty of the input rates, Stream Processing Engines (SPEs) have no perfectevaluation criteria. We want to design a benchmark to give perfect criteria toevaluate a certain SPE and then researchers can find the bottleneck of the SPE.In Mos there is only one master node, so maybe it is a potential bottleneck andMos is hard to recovery from its fault. The next stage we want to implement amechanism with multiple master nodes.

References

 

[1]    LeonardoNeumeyer et al, "S4: Distributed Stream ComputingPlatform", In IEEE Internet Computing Journal, 2010, pp. 170-177.

[2]    Jeffrey Deanand SanjayGhemawat, "MapReduce:simplified data processing on large clusters", In Communicationsof The ACM, 2008, vol. 51, no. 1, pp. 107-113.

[3]    SanjayGhemawatHowardGobioffShun-TakLeung, "The Google file system", In SIGOPS Journal, 2003,vol. 37, no. 5, pp. 29-43.

[4]    Daniel J.Abadi et al, "The Design of the Borealis Stream ProcessingEngine", In CIDR Conference, 2005, pp. 277-289.

[5]    Ying XingJeong-hyonHwang, "Dynamic Load Distribution in the Borealis StreamProcessor", In ICDE Conference, 2005,  pp. 791-802.

[6]    Tyson Condieet al, "MapReduce Online", In NSDI Conference, 2010, pp. 313-328.

[7]    Jeong-hyonHwang et al, "High-Availability Algorithms for DistributedStream Processing", In ICDE Conference, 2005, pp. 779-790.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值