storm实时输出数据存储至HDFS,本地运行基本没问题(有一些jar包冲突问题),主要是集群运行需要的注意事项
1、storm-core的依赖范围需要设置为provided,即打包不添加次依赖,使用storm集群提供的jar
2、hadoop-common和hadoop-client需要排除slf4j-log4j12的依赖,否则会和storm-core中的jar包冲突
3、No FileSystem for scheme: hdfs异常解决
最初打包使用的是maven-assembly-plugin这个maven插件,在hadoop-hdfs和hadoop-common两个jar包的META-INF中的services目录中都有一个org.apache.hadoop.fs.FileSystem文件,该插件打包是会把同名文件进行覆盖,打包之后hadoop-hdfs包的文件被覆盖,所以会报此异常
解决方法为,使用maven-shade-plugin插件替换之前插件,具体实现见下面代码,该插件会把同名文件进行合并
重新打包之后的org.apache.hadoop.fs.FileSystem文件内容为:
具体实现
1.pom.xml
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<!--集群模式需要添加scope,使用storm集群的jar-->
<scope>provided</scope>
<version>0.9.5</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.4</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.4</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hdfs</artifactId>
<version>0.9.5</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.8.2</artifactId>
<version>0.8.1</version>
<exclusions>
<exclusion>
<artifactId>jmxtools</artifactId>
<groupId>com.sun.jdmk</groupId>
</exclusion>
<exclusion>
<artifactId>jmxri</artifactId>
<groupId>com.sun.jmx</groupId>
</exclusion>
<exclusion>
<artifactId>jms</artifactId>
<groupId>javax.jms</groupId>
</exclusion>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.4</version>
<configuration>
<createDependencyReducedPom>true</createDependencyReducedPom>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.shoufubx.StormJobDriver</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
2.stormDriver类
//HDFS的bolt
//定义输出字段分隔符
RecordFormat format = new DelimitedRecordFormat().withFieldDelimiter(" ");
//每5个tuple同步到HDFS上一次
SyncPolicy policy = new CountSyncPolicy(5);
//每个文件的大小,10M
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(10.0f,FileSizeRotationPolicy.Units.MB);
//输出目录
//集群模式
FileNameFormat fileNameFormat = new DefaultFileNameFormat().withPath(args[0]);
//本地运行
// FileNameFormat fileNameFormat = new DefaultFileNameFormat().withPath("/stormJob");
//创建bolt
HdfsBolt hdfsBolt = new HdfsBolt()
.withFsUrl("hdfs://mini1:9000")
.withFileNameFormat(fileNameFormat)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withSyncPolicy(policy);
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.setSpout("kafkaSpout",new KafkaSpout(new SpoutConfig(
new ZkHosts("mini1:2181,mini2:2181,mini3:2181","/kafka/brokers"),"stormJob","/kafka","stormJob")));
topologyBuilder.setBolt("bolt1",new StormJobBolt1()).shuffleGrouping("kafkaSpout");
topologyBuilder.setBolt("bolt2",new StormJobBolt2()).shuffleGrouping("bolt1");
topologyBuilder.setBolt("hdfsBolt",hdfsBolt).shuffleGrouping("bolt2");
Config config = new Config();
StormTopology stormTopology = topologyBuilder.createTopology();
//本地模式
// LocalCluster localCluster = new LocalCluster();
// localCluster.submitTopology("stormJob",config,stormTopology);
//集群模式
try{
StormSubmitter.submitTopology("stormJob",config,stormTopology);
}catch (Exception e){
e.printStackTrace();
}
spout使用kafkaSpout,bolt1和bolt2只是简单的业务逻辑,再此就不粘贴了。
参考:https://blog.csdn.net/u014039577/article/details/49818935
https://blog.csdn.net/u010003835/article/details/80172039