storm-01(2)

以下介绍可靠性处理2、3(顺序处理、每个只处理一次)

============================================================
Strom的可靠处理引发的问题
	在Storm进行可靠处理时,由于tuple可能被再次发送,所以在storm上进行统计个数之类的实现时,可能会存在重复计数问题。
	Storm提供了机制可以实现"按顺序处理且只处理一次"的机制。
	方案1:一次只处理一个tuple
	事务型topology背后的核心思想是处理的数据必须能够保证强顺序性。最简单的实现方式就是一次只处理一个tuple,除非这个tuple处理成功,否则我们不去处理下一个tuple。
	具体来说,每一个tuple都会跟一个唯一transaction id相关联,如果一个tuple处理失败了,然后需要重新发送,那么该tuple会使用完全相同的transaction id被发送。通常这个transaction id可以是一个递增的数字。
	但是这种设计方式有非常大的问题,主要体现在tuple的处理是完全线性的,效率非常低下,没有利用到storm的并行计算能力。

	方案2:一次处理一批tuple
	与每次只处理一个tuple的简单方案相比, 一个更好的方案是每个transaction里面处理一批tuple。所以如果你在做一个计数应用, 那么你每次更新到总数里面的是整个batch里面的tuple数量。如果这个batch失败了,那么你重新发送这整个batch。相应地, 我们不是给每个tuple一个transaction id而是给整个batch分配一个transaction id,batch与batch之间的处理是强顺序性的, 而batch内部是可以并行的。
	虽然这个设计比第一个设计好多了, 它仍然不是一个完美的方案。topology里面的worker会花费大量的时间等待计算的其它部分完成。例如,一个topology中存在多步计算,则后续计算没有完成前,前置的计算空闲等待,只有所有计算都完成后,才能接受新的数据继续计算,这造成了计算资源的浪费。

	方案3:storm使用的方式
	在storm的设计中,考虑到并不是所有工作都需要强顺序性的,所以将整个计算分为两部分
		阶段1:processing阶段:这个阶段可以并行的操作执行
		阶段2:commit阶段:这个阶段按照强顺序性执行操作
		这两个阶段合起来称为一个transaction。在一个给定的时刻,可以有很多batch处于processing阶段,但是只有一个batch可以处在commit阶段。如果一个batch在processing或者commit阶段有任何错误, 那么整个transaction需要被重新进行。
		
============================================================

Trident可以保证以上两个方面

Trident概述 
	Trident是在storm基础上,一个以realtime 计算为目标的高度抽象
	Stream是Trident中的核心数据模型,它被当做一系列的batch来处理。
	在Storm集群的节点之间,一个stream被划分成很多partition(分区),对流的操作(operation)是在每个partition上并行进行的。
	一个Stream被划分成很多partition:partition是stream的一个子集,里面可能有多个batch,一个batch也可能位于不同的partition上

	Trident共有五类操作
		分区本地操作 Partition-local operations 对每个partition的局部操作,不产生网络传输
		重分区操作 Repartitioning operations 对数据流的重新划分(仅仅是划分,但不改变内容),产生网络传输
		聚合操作 Aggregation operations 
		作用在分组流上的操作 Operations on grouped streams 
		Merge、join 操作

分区,批的关系,一个分区可以处理多个批,一个批可以被多个分区处理

一、分区本地操作 Partition-local operations

1.过滤操作

过滤操作通过 过滤器 - Filter 实现。
		所有Filter都要直接或间接实现Filter接口,通常我们会去继承BaseFilter抽象类
		Filter收到一个输入tuple后可以决定是否留着这个tuple。
		=========================
		~方法:
			each(Fields,Filter)
			第一个参数:从当前Stream中获取哪几个属性进入过滤器,注意new Fields(String ...fields)中属性的声明声明的顺序决定了Filter中tuple中属性的顺序
			第二个参数:Filter对象
		=========================

案例1:PrintFilter 实现拦截到tuple后按序打印所有拦截到的属性:

package com.liming;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.utils.Utils;
import storm.trident.Stream;
import storm.trident.TridentTopology;

public class TridentDemo {
	public static void main(String[] args) {
		//--创建topology
		TridentTopology topology = new TridentTopology();
		
		//TODO
		Stream s1 = topology.newStream("xx", new SentenceSpout());
		s1.each(s1.getOutputFields(), new PrintFilter());
		
		
		//--提交到集群中运行
		Config conf = new Config();
		LocalCluster cluster = new LocalCluster();
		cluster.submitTopology("MyTopology", conf, topology.build());
		
		//--运行10秒钟后杀死Topology关闭集群
		Utils.sleep(1000 * 10);
		cluster.killTopology("MyTopology");
		cluster.shutdown();
	}
}
package com.liming;

import java.util.Map;

import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;

public class SentenceSpout extends BaseRichSpout{

	private SpoutOutputCollector collector = null;
	
	private Values [] values = {
			new Values("xiaoming","i am so shuai"),
			new Values("xiaoming","do you like me"),
			new Values("xiaohua","i do not like you"),
			new Values("xiaohua","you look like fengjie"),
			new Values("xiaoming","are you sure you do not like me"),
			new Values("xiaohua","yes i am"),
			new Values("xiaoming","ok i am sure")
	};
	
	private int index = 0;
	@Override
	public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
		this.collector = collector;
	}

	@Override
	public void nextTuple() {
		collector.emit(values[index]);
		index = index+1 == values.length ? 0 : index+1;
		Utils.sleep(100);
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		Fields fields = new Fields("name","sentence");
		declarer.declare(fields);
	}

}
package com.liming;

import java.util.Iterator;
import java.util.Map;

import backtype.storm.tuple.Fields;
import storm.trident.operation.BaseFilter;
import storm.trident.operation.TridentOperationContext;
import storm.trident.tuple.TridentTuple;

public class PrintFilter extends BaseFilter{

	private TridentOperationContext context = null;
	@Override
	public void prepare(Map conf, TridentOperationContext context) {
		super.prepare(conf, context);
		this.context = context;
	}
	
	@Override
	public boolean isKeep(TridentTuple tuple) {
		StringBuffer buf = new StringBuffer();
		
		Fields fields = tuple.getFields();
		Iterator<String> it = fields.iterator();

		while(it.hasNext()){
			String key = it.next();
			Object value = tuple.getValueByField(key);
			buf.append("---"+key+":"+value+"---");
		}
		System.out.println(buf.toString());
		
		return true;
	}
	
}

测试结果:

案例二:开发Filter,过滤所有xiaohua说的话:

注意代码编写方式:

package com.liming.xiaohuaFilter;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.tuple.Fields;
import backtype.storm.utils.Utils;
import storm.trident.Stream;
import storm.trident.TridentTopology;

public class TridentDemo {
	public static void main(String[] args) {
		//--创建topology
		TridentTopology topology = new TridentTopology();
		
		//TODO
//		Stream s1 = topology.newStream("xx", new SentenceSpout());
//		Stream s2 = s1.each(new Fields("name"), new XiaohuaFilter());
//		s2.each(s2.getOutputFields(), new PrintFilter());
		Stream s1 = topology.newStream("xx", new SentenceSpout())
		.each(new Fields("name"), new XiaohuaFilter())
		.each(new Fields("name","sentence"), new PrintFilter());
		
		//--提交到集群中运行
		Config conf = new Config();
		LocalCluster cluster = new LocalCluster();
		cluster.submitTopology("MyTopology", conf, topology.build());
		
		//--运行10秒钟后杀死Topology关闭集群
		Utils.sleep(1000 * 10);
		cluster.killTopology("MyTopology");
		cluster.shutdown();
	}
}
package com.liming.xiaohuaFilter;

import java.util.Map;

import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;

public class SentenceSpout extends BaseRichSpout{

	private SpoutOutputCollector collector = null;
	
	private Values [] values = {
			new Values("xiaoming","i am so shuai"),
			new Values("xiaoming","do you like me"),
			new Values("xiaohua","i do not like you"),
			new Values("xiaohua","you look like fengjie"),
			new Values("xiaoming","are you sure you do not like me"),
			new Values("xiaohua","yes i am"),
			new Values("xiaoming","ok i am sure")
	};
	
	private int index = 0;
	@Override
	public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
		this.collector = collector;
	}

	@Override
	public void nextTuple() {
		collector.emit(values[index]);
		index = index+1 == values.length ? 0 : index+1;
		Utils.sleep(100);
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		Fields fields = new Fields("name","sentence");
		declarer.declare(fields);
	}

}

注意代码编写方式:

package com.liming.xiaohuaFilter;

import storm.trident.operation.BaseFilter;
import storm.trident.tuple.TridentTuple;

public class XiaohuaFilter extends BaseFilter {

	@Override
	public boolean isKeep(TridentTuple tuple) {
		return !"xiaohua".equals(tuple.getStringByField("name"));
	}

}
package com.liming.xiaohuaFilter;

import java.util.Iterator;
import java.util.Map;

import backtype.storm.tuple.Fields;
import storm.trident.operation.BaseFilter;
import storm.trident.operation.TridentOperationContext;
import storm.trident.tuple.TridentTuple;

public class PrintFilter extends BaseFilter{

	private TridentOperationContext context = null;
	@Override
	public void prepare(Map conf, TridentOperationContext context) {
		super.prepare(conf, context);
		this.context = context;
	}
	
	@Override
	public boolean isKeep(TridentTuple tuple) {
		StringBuffer buf = new StringBuffer();
		
		Fields fields = tuple.getFields();
		Iterator<String> it = fields.iterator();

		while(it.hasNext()){
			String key = it.next();
			Object value = tuple.getValueByField(key);
			buf.append("---"+key+":"+value+"---");
		}
		System.out.println(buf.toString());
		
		return true;
	}
	
}

测试如下:

思考1 - 如果存在下列tuple和Filter,请问经过Filter的结果是什么:
			假设存在如下过滤器:
				public class MyFilter extends BaseFilter {
				    public boolean isKeep(TridentTuple tuple) {
				        return tuple.getInteger(0) == 1 && tuple.getInteger(1) == 2;
				    }
				}
			假设你有如下这些tuple(包含的字段为["a", "b", "c"]):
				[1, 2, 3]
				[2, 1, 1]
				[2, 3, 4]
			运行下面的代码:
				mystream.each(new Fields("b", "a"), new MyFilter())
			则得到的输出tuple为:
				[2, 1, 1]

2.函数操作

函数操作通过 函数 - Function 来实现。
		所有Function都要直接或间接实现Function接口,通常我们会去继承BaseFunction抽象类。
		一个function收到一个输入tuple后可以输出0或多个tuple。
		输出tuple的字段被追加到接收到的输入tuple后面。
		如果对某个tuple执行function后没有输出tuple,则该tuple被过滤。
		如果对某个tuple执行function后产生了多个输出tuple,会造成tuple的增加。
		=========================
		~方法:
			each(Fields,Function,Fields)
			第一个参数:要从流中获取哪些Fields进入Function,注意new Fields(String ...fields)中属性的声明声明的顺序决定了Function中tuple中属性的顺序
			第二个参数:Function对象
			第三个参数:Function执行过后额外追加的属性的Fields
		=========================

案例三: 改造如上案例,增加性别属性:

注意编码方式:

package com.liming.func;


import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.tuple.Fields;
import backtype.storm.utils.Utils;
import storm.trident.Stream;
import storm.trident.TridentTopology;


public class TridentDemo {
	public static void main(String[] args) {
		//--创建topology
		TridentTopology topology = new TridentTopology();
		
		//TODO
//		Stream s1 = topology.newStream("xx", new SentenceSpout())
//		.each(new Fields("name"), new XiaohuaFilter())
//		.each(new Fields("name"), new GenderFunc(),new Fields("gender"))
//		.each(new Fields("name","sentence","gender"), new PrintFilter());
		
		Stream s = topology.newStream("xx", new SentenceSpout())
		.each(new Fields("name"), new XiaohuaFilter())
		.each(new Fields("name"), new GenderFunc(),new Fields("gender"));
		s.each(s.getOutputFields(), new PrintFilter());
		
		//--提交到集群中运行
		Config conf = new Config();
		LocalCluster cluster = new LocalCluster();
		cluster.submitTopology("MyTopology", conf, topology.build());
		
		//--运行10秒钟后杀死Topology关闭集群
		Utils.sleep(1000 * 10);
		cluster.killTopology("MyTopology");
		cluster.shutdown();
	}
}
	public static void main(String[] args) {
		//--创建topology
		TridentTopology topology = new TridentTopology();
		
		//TODO
//		Stream s1 = topology.newStream("xx", new SentenceSpout())
//		.each(new Fields("name"), new XiaohuaFilter())
//		.each(new Fields("name"), new GenderFunc(),new Fields("gender"))
//		.each(new Fields("name","sentence","gender"), new PrintFilter());
		
		Stream s = topology.newStream("xx", new SentenceSpout())
		.each(new Fields("name"), new XiaohuaFilter())
		.each(new Fields("name"), new GenderFunc(),new Fields("gender"));
		s.each(s.getOutputFields(), new PrintFilter());
		
		//--提交到集群中运行
		Config conf = new Config();
		LocalCluster cluster = new LocalCluster();
		cluster.submitTopology("MyTopology", conf, topology.build());
		
		//--运行10秒钟后杀死Topology关闭集群
		Utils.sleep(1000 * 10);
		cluster.killTopology("MyTopology");
		cluster.shutdown();
	}
}
package com.liming.func;

import java.util.Map;

import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;

public class SentenceSpout extends BaseRichSpout{

	private SpoutOutputCollector collector = null;
	
	private Values [] values = {
			new Values("xiaoming","i am so shuai"),
			new Values("xiaoming","do you like me"),
			new Values("xiaohua","i do not like you"),
			new Values("xiaohua","you look like fengjie"),
			new Values("xiaoming","are you sure you do not like me"),
			new Values("xiaohua","yes i am"),
			new Values("xiaoming","ok i am sure")
	};
	
	private int index = 0;
	@Override
	public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
		this.collector = collector;
	}

	@Override
	public void nextTuple() {
		collector.emit(values[index]);
		index = index+1 == values.length ? 0 : index+1;
		Utils.sleep(100);
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		Fields fields = new Fields("name","sentence");
		declarer.declare(fields);
	}

}
package com.liming.func;

import storm.trident.operation.BaseFilter;
import storm.trident.tuple.TridentTuple;

public class XiaohuaFilter extends BaseFilter {

	@Override
	public boolean isKeep(TridentTuple tuple) {
//		return !"xiaohua".equals(tuple.getStringByField("name"));
		return true;
	}

}
package com.liming.func;

import backtype.storm.tuple.Values;
import storm.trident.operation.BaseFunction;
import storm.trident.operation.TridentCollector;
import storm.trident.tuple.TridentTuple;

public class GenderFunc extends BaseFunction{

	@Override
	public void execute(TridentTuple tuple, TridentCollector collector) {
		String name = tuple.getStringByField("name");
		if("xiaoming".equals(name)){
			collector.emit(new Values("male"));
		}else if("xiaohua".equals(name)){
			collector.emit(new Values("female"));
		}else{
		}
	}
	
}
package com.liming.func;

import java.util.Iterator;
import java.util.Map;

import backtype.storm.tuple.Fields;
import storm.trident.operation.BaseFilter;
import storm.trident.operation.TridentOperationContext;
import storm.trident.tuple.TridentTuple;

public class PrintFilter extends BaseFilter{

	private TridentOperationContext context = null;
	@Override
	public void prepare(Map conf, TridentOperationContext context) {
		super.prepare(conf, context);
		this.context = context;
	}
	
	@Override
	public boolean isKeep(TridentTuple tuple) {
		StringBuffer buf = new StringBuffer();
		
		Fields fields = tuple.getFields();
		Iterator<String> it = fields.iterator();

		while(it.hasNext()){
			String key = it.next();
			Object value = tuple.getValueByField(key);
			buf.append("---"+key+":"+value+"---");
		}
		System.out.println(buf.toString());
		
		return true;
	}
	
}

测试结果如下:
 

思考2 - 如果存在下列tuple和Function,请问经过Function的结果是什么:
			假如有如下Function:
				public class MyFunction extends BaseFunction {
				    public void execute(TridentTuple tuple, TridentCollector collector) {
				        for(int i=0; i < tuple.getInteger(0); i++) {
				            collector.emit(new Values(i));
				        }
				    }
				}
			假设有个叫“mystream”的流(stream),该流中有如下tuple( tuple的字段为["a", "b", "c"] ),
				[1, 2, 3]
				[4, 1, 6]
				[3, 0, 8]
			运行下面的代码:
				mystream.each(new Fields("b"), new MyFunction(), new Fields("d")))
			则输出tuple中的字段为["a", "b", "c", "d"],如下所示
				[1, 2, 3, 0]
				[1, 2, 3, 1]
				[4, 1, 6, 0]

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值