spout的输入数据可以从数据库里查、网页用netty传输、在这个helloWorld里面我就直接写死在程序里面
二、helloworld
1、首先创建一个maven项目,并且加入storm的jar包
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.2-incubating</version>
</dependency>
2、新增spout类
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
public class PWSpout extends BaseRichSpout {
private static final long serialVersionUID = 1L;
private SpoutOutputCollector collector;
private static final Map<Integer, String> map = new HashMap<Integer, String>();
static {
map.put(0, "java");
map.put(1, "php");
map.put(2, "groovy");
map.put(3, "python");
map.put(4, "ruby");
}
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
//对spout进行初始化
this.collector = collector;
//System.out.println(this.collector);
}
/**
* <B>方法名称:</B>轮询tuple<BR>
* <B>概要说明:</B><BR>
* @see backtype.storm.spout.ISpout#nextTuple()
*/
@Override
public void nextTuple() {
//随机发送一个单词
final Random r = new Random();
int num = r.nextInt(5);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
//发射
this.collector.emit(new Values(map.get(num)));
}
/**
* <B>方法名称:</B>declarer声明发送数据的field<BR>
* <B>概要说明:</B><BR>
* @see backtype.storm.topology.IComponent#declareOutputFields(backtype.storm.topology.OutputFieldsDeclarer)
*/
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
//进行声明
declarer.declare(new Fields("print"));
}
}
3、新增bolt类
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class PrintBolt extends BaseBasicBolt {
private static final Log log = LogFactory.getLog(PrintBolt.class);
private static final long serialVersionUID = 1L;
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
//获取上一个组件所声明的Field
String print = input.getStringByField("print");
log.info("【print】: " + print);
//System.out.println("Name of input word is : " + word);
//进行传递给下一个bolt
collector.emit(new Values(print));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("write"));
}
}
4、新增bolt类
import java.io.FileWriter;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import clojure.main;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;
public class WriteBolt extends BaseBasicBolt {
private static final long serialVersionUID = 1L;
private static final Log log = LogFactory.getLog(WriteBolt.class);
private FileWriter writer ;
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
//获取上一个组件所声明的Field
String text = input.getStringByField("write");
try {
if(writer == null){
if(System.getProperty("os.name").equals("Windows 10")){
writer = new FileWriter("D:\\099_test\\" + this);
} else if(System.getProperty("os.name").equals("Windows 8.1")){
writer = new FileWriter("D:\\099_test\\" + this);
} else if(System.getProperty("os.name").equals("Windows 7")){
writer = new FileWriter("D:\\099_test\\" + this);
} else if(System.getProperty("os.name").equals("Linux")){
System.out.println("----:" + System.getProperty("os.name"));
writer = new FileWriter("/usr/local/temp/" + this);
}
}
log.info("【write】: 写入文件");
writer.write(text);
writer.write("\n");
writer.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
5、最后编写主函数(Topology)去提交一个任务
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.topology.TopologyBuilder;
import hellow.bolt.PrintBolt;
import hellow.bolt.WriteBolt;
import hellow.spout.PWSpout;
public class PWTopology1 {
public static void main(String[] args) throws Exception {
Config cfg = new Config();
//有几个人工作,代表有多少个jvm
cfg.setNumWorkers(2);
cfg.setDebug(true);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new PWSpout());
builder.setBolt("print-bolt", new PrintBolt()).shuffleGrouping("spout");
builder.setBolt("write-bolt", new WriteBolt()).shuffleGrouping("print-bolt");
//1 本地模式
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("top1", cfg, builder.createTopology());
Thread.sleep(10000);
cluster.killTopology("top1");
cluster.shutdown();
//2 集群模式
// StormSubmitter.submitTopology("top1", cfg, builder.createTopology());
}
}
查看文件
上面的例子是在本地模式下。
下面说一下在集群模式下,看看有什么不一样。
1、首先在PWTopology1类里面注释掉本地模式,开启集群模式,并且将编译好的storm01.jar放到虚拟机下面,虚拟机需要安装zk和storm集群(安装过程可以查看http://blog.csdn.net/u010634288/article/details/78619747)。
使用下面命令启动
storm jar storm01.jar hellow.topology.PWTopology1
进入supervisor节点,也就是241或者242可以看到该机器是个worker.
在此过程中,查看日志的时候发现报错如下Received invalid messages for unknown tasks. Dropping
该问题是host配置问题,解析host出问题(三台机子都要配置,只是hostname需要自定义一下)
a、设置hostname:
hostname storm-master
b、
vim /etc/sysconfig/network
设置
HOSTNAME=storm-master
c、(三台机都配置上)
vim /etc/hosts
127.0.0.1 localhost localhost.localdomain
::1 localhost6 localhost6.localdomain6
192.168.100.240 storm-master
192.168.100.241 storm-supervisor01
192.168.100.242 storm-supervisor02
host的作用是告诉你自己的电脑访问某个域名的时候就应该去访问对应的ip地址,绕过了域名解析服务器
启动之后,我们可以在服务器上查看日志
tail -f /usr/local/storm/apache-storm-0.9.2/logs/worker-6703.log
并且也在/usr/local/temp生产一个文件。
如果想要结束可以点击管控台上的kill
由上面我可以看出,好像集群下和本地模式似乎没什么区别,都是输出到都是将数据输出到一个文件里面。其实这是由于我们Topology的主函数,并没有设置JVM个数、线程以及Task的缘故。
在storm有一些需要了解的概念
8、Configuration(配置)
Worker的配置,表示我们在将应用放在了多少个JVM下、如果是在本地模式下,只有一个worker,也就是只有一个JVM,我们的拓扑图执行如下
若是设置了这样的spout
builder.setSpout(SENTENCE_SPOUT_ID, spout, 2);
如果我们在集群环境下这样设置
Config config = new Config();
config.setNumWorkers(2);
builder.setSpout(SENTENCE_SPOUT_ID, spout, 2);
builder.setBolt(SPLIT_BOLT_ID, splitBolt, 2)
.setNumTasks(4)
.shuffleGrouping(SENTENCE_SPOUT_ID);
builder.setBolt(COUNT_BOLT_ID, countBolt, 4)
fieldsGrouping(SPLIT_BOLT_ID, new
Fields("word"));
因此,我下面这段代码,
我们首先设置了2个工作进程,也就是2个JVM
然后我们设置了spout的并行度为2,产生了2个执行器和2个任务
第一个bolt的并行度为2(产生2个系星期和2个任务)
第二个bolt的并行度为6产生了6个执行器和6个任务
因此,该拓扑共有2个工作进程(worker),2+2+6=10个执行器(executor),2+4+6=12个任务(task)。每个工作进程可以领取到12/2=6个任务,默认情况下,一个执行器执行一个任务,但如果指定了任务的数目,则任务会平均分配到执行器中,因此,会将WriteBolt的6个task平均分配到2个worker上,因此将会在192.168.100.241和192.168.100.242上的/usr/local/temp目录下分别生成三个文件
public class PWTopology2 {
public static void main(String[] args) throws Exception {
Config cfg = new Config();
cfg.setNumWorkers(2);//设置使用俩个工作进程
cfg.setDebug(false);
TopologyBuilder builder = new TopologyBuilder();
//设置sqout的并行度和任务数(产生2个执行器和俩个任务)
builder.setSpout("spout", new PWSpout(), 2);//.setNumTasks(2);
//设置bolt的并行度和任务数:(产生2个执行器和4个任务)
builder.setBolt("print-bolt", new PrintBolt(), 2).shuffleGrouping("spout").setNumTasks(4);
//设置bolt的并行度和任务数:(产生6个执行器和6个任务)
builder.setBolt("write-bolt", new WriteBolt(), 6).shuffleGrouping("print-bolt");
//2 集群模式
StormSubmitter.submitTopology("top2", cfg, builder.createTopology());
}
}