最近一直在读twitter开源的这个分布式流计算框架——storm的源码,还是有必要记录下一些比较有意思的地方。我按照storm的主要概念进行组织,并且只分析我关注的东西,因此称之为浅析。
一、介绍
Storm的开发语言主要是Java和Clojure,其中Java定义骨架,而Clojure编写核心逻辑。源码统计结果:
Java代码25000多行,而Clojure(Lisp)只有4871行,说语言不重要再次证明是扯淡。
二、Topology和Nimbus
Topology是storm的核心理念,将spout和bolt组织成一个topology,运行在storm集群里,完成实时分析和计算的任务。这里我主要想介绍下topology部署到storm集群的大概过程。提交一个topology任务到Storm集群是通过StormSubmitter.submitTopology方法提交:
一、介绍
Storm的开发语言主要是Java和Clojure,其中Java定义骨架,而Clojure编写核心逻辑。源码统计结果:
180
text files.
177 unique files.
7 files ignored.
http: // cloc.sourceforge.net v 1.55 T=1.0 s (171.0 files/s, 46869.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Java 125 5010 2414 25661
Lisp 33 732 283 4871
Python 7 742 433 4675
CSS 1 12 45 1837
ruby 2 22 0 104
Bourne Shell 1 0 0 6
Javascript 2 1 15 6
-------------------------------------------------------------------------------
SUM: 171 6519 3190 37160
-------------------------------------------------------------------------------
177 unique files.
7 files ignored.
http: // cloc.sourceforge.net v 1.55 T=1.0 s (171.0 files/s, 46869.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Java 125 5010 2414 25661
Lisp 33 732 283 4871
Python 7 742 433 4675
CSS 1 12 45 1837
ruby 2 22 0 104
Bourne Shell 1 0 0 6
Javascript 2 1 15 6
-------------------------------------------------------------------------------
SUM: 171 6519 3190 37160
-------------------------------------------------------------------------------
Java代码25000多行,而Clojure(Lisp)只有4871行,说语言不重要再次证明是扯淡。
二、Topology和Nimbus
Topology是storm的核心理念,将spout和bolt组织成一个topology,运行在storm集群里,完成实时分析和计算的任务。这里我主要想介绍下topology部署到storm集群的大概过程。提交一个topology任务到Storm集群是通过StormSubmitter.submitTopology方法提交:
StormSubmitter.submitTopology(name, conf, builder.createTopology());
我们将topology打成jar包后,利用bin/storm这个python脚本,执行如下命令:
bin
/
storm jar xxxx.jar com.taobao.MyTopology args
将jar包提交给storm集群。storm脚本会启动JVM执行Topology的main方法,执行submitTopology的过程。而submitTopology会将jar文件上传到nimbus,上传是通过socket传输。在storm这个python脚本的jar方法里可以看到:
def
jar(jarfile, klass,
*
args):
exec_storm_class(
klass,
jvmtype = " -client " ,
extrajars = [jarfile, CONF_DIR, STORM_DIR + " /bin " ],
args = args,
prefix ="export STORM_JAR=" + jarfile + ";" )
将jar文件的地址设置为环境变量STORM_JAR,这个环境变量在执行submitTopology的时候用到:
exec_storm_class(
klass,
jvmtype = " -client " ,
extrajars = [jarfile, CONF_DIR, STORM_DIR + " /bin " ],
args = args,
prefix ="export STORM_JAR=" + jarfile + ";" )
//
StormSubmitter.java
private static void submitJar(Map conf) {
if (submittedJar == null ) {
LOG.info( " Jar not uploaded to master yet. Submitting jar " );
String localJar = System.getenv("STORM_JAR" );
submittedJar = submitJar(conf, localJar);
} else {
LOG.info( " Jar already uploaded to master. Not submitting jar. " );
}
}
通过环境变量找到jar包的地址,然后上传。利用环境变量
private static void submitJar(Map conf) {
if (submittedJar == null ) {
LOG.info( " Jar not uploaded to master yet. Submitting jar " );
String localJar = System.getenv("STORM_JAR" );
submittedJar = submitJar(conf, localJar);
} else {
LOG.info( " Jar already uploaded to master. Not submitting jar. " );
}
}