学习进展记录:
- 2019-8-21:解决ui无法显示输出log的问题;开始阅读github中starter的demo代码;解决在idea中远程调试Storm程序的问题;
- 2019-8-27,手打、阅读完Storm的官方Topology demo;
- 接下来学习,Storm与其他项目的集成,输入和输出;
2020-5-9,Strom似乎被Flink替代了,不再需要学习和使用了
资料:
专有名词:
- Storm(风暴):一种实时计算引擎,可提供毫秒级实时计算;相较而言,Spark Streaming仅可提供秒级准实时计算;
- Java为其主要开发语言。将Storm代码极其依赖打成jar包,然后调用storm cli命令开启运行;
- Topologies:Storm的计算图例,包含进程逻辑、数据流向,有些类似于Hadoop的MapReduce;但MR任务是会结束的,topologies任务则不会结束(除非手动kill);
- Nodes:分为 master node 和 worker nodes 两类;
- Nimbus(云雨):任务分配、监控Storm集群;
- Supervisor:Nimbus将集群的任务分配给各个node,Supervisor进行监听,从而开启、关闭该节点的进程;
- Storm Cluster极其健壮、稳定,比Spark稳定;
- Strom Stream:
- spout(龙卷风):将数据源(API / MQ)转化为Storm Stream;
- bolt(闪电):将Stream进行函数计算、生成新流、下发数据等;
- spout和bolt组成的工作网络,称之为Topology;
- DRPC:Distributed Remote Procedure Call,分布式远程调用;主要作用在于,利用Storm来完成那些计算密集型函数(CPU密集型函数)的实时并行计算;
- 输入:
- 输出:Hive、
Tips:
- 运行Storm时,storm nimbus这个进程一定要开启;
- 重启zookeeper中storm文件,可以清除所有Storm的topology;
- 在Storm cli提交jar包运行之后,一般要过一分钟左右,Storm UI才会有显示;
- 在集群上启动Storm时,要在主节点开启nimbus、ui进程,在从节点开启supervisor进程,且在所有节点均开启logviewer进程,才能在Storm UI上查看log;
- 若想在本地idea直接运行集群的Storm,可将storm.yaml拷贝当java项目的src文件目录下即可;
Storm CLI
# 详细见:http://storm.apache.org/releases/2.0.0/Command-line-client.html
storm jar <topology-jar-path> <class-name> <args> # 在Storm集群上运行jar包,<args>为java 主函数的参数
storm sql <sql-file> <topology-name> # 运行sql
storm list # 显示目前运行的所有topology
storm kill <topology-name> # 中止指定topology的计算
storm activate <topology-name> # 激活某个topology
storm deactivate <topology-name> # 失活某个topology
storm classpath # 打印路径
storm server_classpath # 打印路径
storm nimbus # 开启nimbus进程
storm supervisor # 开启supervisor进程
storm ui # 开启storm的web页面进程
storm drpc # 开启drpc进程
storm logviewer # 开启logviewer进程,从而在storm ui上浏览log
storm version # Storm版本号
storm help # 帮助
踩坑记录
Q:启动Storm UI时,报错org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts ["bigdata85"]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
A:https://blog.csdn.net/henianyou/article/details/73733637
Q:在集群中运行Storm的工程时,明明在Maven中添加了pom依赖,但报错Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/storm/starter/spout/RandomSentenceSpout
A:将RandomSentenceSpout方法所在的jar文件,storm-starter-1.1.10.jar拷贝到集群中的$STORM_HOME/lib路径下即可
Q:在idea本地运行Storm项目,报错java.lang.ClassNotFoundException: org.apache.storm.topology.IRichSpout
A:删除pom.xml中的<scope>provided</scope>
;打jar包在集群上运行,参数为 provided;若直接在本地运行,删除即可;
Storm工程模板
/**
* <dependencies>
* <dependency>
* <groupId>org.apache.storm</groupId>
* <artifactId>storm-core</artifactId>
* <version>1.1.1</version>
* <!--<scope>provided</scope>-->
* <!-- 打jar包在集群上运行,参数为 provided;若直接在本地运行,删除即可 -->
* </dependency>
*
* </dependencies>
*/
/**
* @ Name : Storm_demo
* @ Description : TODO
* @ Author : yangsong
* @ Date : 2019-8-27 11:35
* @ Version : 1.0
**/
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
public class Storm_demo {
public static class Split extends BaseRichSpout { // 根据业务逻辑重写Spout
SpoutOutputCollector _collector;
int _base = 0;
int _i = 0;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_base = context.getThisTaskIndex();
}
@Override
public void nextTuple() {
Values v = new Values(_base, _i);
_collector.emit(v, "ACK");
_i++;
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// declarer.declare(new Fields("fields_spout"));
}
// @Override
// 重写函数
}
public static class Bolt1 extends BaseBasicBolt { // 根据业务逻辑重写Bolt
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// Empty
// declarer.declare(new Fields("blot1", "bolt1_info"));
}
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
// 根据业务逻辑重写
}
// @Override
// 重写函数
}
public static class Bolt2 extends BaseBasicBolt { // 根据业务逻辑重写Bolt
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// Empty
declarer.declare(new Fields("blot2", "bolt2_info"));
}
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
// 根据业务逻辑重写
}
// @Override
// 重写函数
}
public static class Bolt3 extends BaseRichBolt { // 根据业务逻辑重写Bolt
OutputCollector _collector;
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// Empty
declarer.declare(new Fields("blot3", "blot3_info"));
}
@Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
@Override
public void execute(Tuple tuple) {
// 根据业务逻辑重写
}
// @Override
// 重写函数
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout_name", new Split(), 3); // 创建Spout
builder.setBolt("bolt1_name", new Bolt1(), 3).shuffleGrouping("spout_name"); // 创建Bolt
builder.setBolt("bolt2_name", new Bolt2(), 3).shuffleGrouping("bolt1_name"); // 创建Bolt
builder.setBolt("bolt3_name", new Bolt3(), 3).shuffleGrouping("bolt2_name"); // 创建Bolt
Config conf = new Config(); // 集群配置
conf.setDebug(true); // 设置Storm记录Spout、Bolt产生的每条信息
if (args != null && args.length > 0) { // 若输入参数正常
conf.setNumWorkers(3); // 设置worker为3
StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology()); // 提交Topology
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("topology_name", conf, builder.createTopology());
Thread.sleep(10000);
// cluster.killTopology(); // kill掉topology
cluster.shutdown();
}
}
}