背景:
在Storm测试环境中,有时我们无法拿到真实的发送日志数据。为了便于测试,可以用模拟的数据来代替。
主要步骤:
1.了解待发送的数据文件格式,及存储方式,比如存放在HDFS上,这样很方便MapReduce来读取;
2.编写MapReduce逻辑,读取对应的日志文件,配置接收端的ip及端口信息,以可控制的速率发送日志;
3.在Spout中,编写UDP接收服务器,用队列的方式接收发送过来的日志,再在nextTuple在,依次处理每一个接收的日志。
部分代码:
1.发送日志部分(主要用到Map,下面只给出Map部分的逻辑处理):
public static class LogSendMapper extends
Mapper<Object, Text, Tuple, Text> {
/** whether send log. */
private boolean ifSend;
/** socket. */
private DatagramSocket clientSocket;
/** ip adress. */
private InetAddress iPAddress;
/** port. */
private int port;
protected void setup(Context context)
throws IOException, InterruptedException {
// 是否发送
ifSend = true;
// udp client
clientSocket = new DatagramSocket();
// storm 测试机器
iPAddress = InetAddress.getByName("xxx.xxx.xxx.xxx");
// 接收端 port
port = 9999;
};
@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
try {
// 封装 UDP 数据
DatagramPacket sendPacket = new DatagramPacket(
value.toString().getBytes(),
value.toString().getBytes().length,
iPAddress, port);
// 发送 UDP 数据
if (ifSend) {
// 控制发送速率
Thread.sleep(2);
clientSocket.send(sendPacket);
context.getCounter("INFO", "send count").increment(1);
}
} catch (Exception e) {
context.getCounter("INFO", "error count").increment(1);
}
}
}
2.接收日志部分(Spout):
pulic class Spout extends BaseRichSpout{
/** log. */
private static Log log = LogFactory.getLog(Spout.class);
/** collector. */
SpoutOutputCollector _collector;
/** the queue to recive udp log. */
private ConcurrentLinkedQueue<String> queue = new ConcurrentLinkedQueue<String>();
/** udp server.*/
private UdpServer udpServer = null;
/** receive count.*/
private long spoutReceiveCount = 0;
/** emit count.*/
private long spoutEmitCount = 0;
/** error count.*/
private long spoutErrorCount = 0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
}
@Override
public void activate(){
// 启动接收服务
udpServer = new UdpServer();
udpServer.setStartInfo();
udpServer.start();
}
public void deactivate(){
try {
udpServer.setRunFlag(false);
} catch (Exception e) {
}
log.info(Thread.currentThread()
+ "spoutReceiveCount:" + spoutReceiveCount);
log.info(Thread.currentThread()
+ "spoutEmitCount:" + spoutEmitCount);
log.info(Thread.currentThread()
+ "spoutErrorCount:" + spoutErrorCount);
}
public void nextTuple() {
String str = queue.poll();
if (str == null ) {
return;
}else{
/*
// 判断数据是否正确,
if(error){
spoutErrorCount ++;
}
*/
_collector.emit(new Values(str));
spoutEmitCount++;
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("log"));
}
/*建立udp 接收服务器*/
public class UdpServer extends Thread{
private int port;
private String ip;
private boolean isTrue;
private DatagramSocket ds;
public void setStartInfo(){
try {
ip = InetAddress.getLocalHost().getHostAddress();
port = 9999;
isTrue = true;
ds = new DatagramSocket(port);
} catch (UnknownHostException e) {
e.printStackTrace();
}catch (SocketException e) {
e.printStackTrace();
}
}
@Override
public void run(){
while (isTrue) {
try {
byte[] buff = new byte[4096];
DatagramPacket dp = new DatagramPacket(buff, 0, buff.length);
spoutReceiveCount ++;
ds.receive(dp);
String str = new String(buff, 0, dp.getLength()-1);
queue.add(str);
} catch (IOException e) {
e.printStackTrace();
}
}
stopReceive();
}
public void stopReceive(){
try{
ds.close();
} catch (Exception e) {
}
}
public void setRunFlag(boolean runFlag){
this.isTrue = runFlag;
}
}
}
注意事项
1.测试时,优先提交Storm任务(topology),然后留意Spout所在机器的IP地址,端口号可以手动设定;
2.然后配置Map中UDP对应的接收端IP及端口信息,提交MapReduce任务;
3.发送数据完毕后,检测发送的数据量,与spout中接收的数据量是否一致,然后调整发送速率,以得到一个比较理想的发送接收逻辑。