storm-hdfs整合实现及异常处理

storm实时输出数据存储至HDFS,本地运行基本没问题(有一些jar包冲突问题),主要是集群运行需要的注意事项

1、storm-core的依赖范围需要设置为provided,即打包不添加次依赖,使用storm集群提供的jar
2、hadoop-common和hadoop-client需要排除slf4j-log4j12的依赖,否则会和storm-core中的jar包冲突
3、No FileSystem for scheme: hdfs异常解决
最初打包使用的是maven-assembly-plugin这个maven插件,在hadoop-hdfs和hadoop-common两个jar包的META-INF中的services目录中都有一个org.apache.hadoop.fs.FileSystem文件,该插件打包是会把同名文件进行覆盖,打包之后hadoop-hdfs包的文件被覆盖,所以会报此异常
解决方法为,使用maven-shade-plugin插件替换之前插件,具体实现见下面代码,该插件会把同名文件进行合并
重新打包之后的org.apache.hadoop.fs.FileSystem文件内容为:

具体实现
1.pom.xml

<dependencies>

        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <!--集群模式需要添加scope,使用storm集群的jar-->
            <scope>provided</scope>
            <version>0.9.5</version>
        </dependency>

        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-kafka</artifactId>
            <version>0.9.5</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.4</version>
            <exclusions>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.4</version>
            <exclusions>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-hdfs</artifactId>
            <version>0.9.5</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.hadoop</groupId>
                    <artifactId>hadoop-client</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.hadoop</groupId>
                    <artifactId>hadoop-hdfs</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.8.2</artifactId>
            <version>0.8.1</version>
            <exclusions>
                <exclusion>
                    <artifactId>jmxtools</artifactId>
                    <groupId>com.sun.jdmk</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jmxri</artifactId>
                    <groupId>com.sun.jmx</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jms</artifactId>
                    <groupId>javax.jms</groupId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>1.4</version>
                <configuration>
                    <createDependencyReducedPom>true</createDependencyReducedPom>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.shoufubx.StormJobDriver</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

2.stormDriver类

//HDFS的bolt
        //定义输出字段分隔符
        RecordFormat format = new DelimitedRecordFormat().withFieldDelimiter(" ");
        //每5个tuple同步到HDFS上一次
        SyncPolicy policy = new CountSyncPolicy(5);
        //每个文件的大小,10M
        FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(10.0f,FileSizeRotationPolicy.Units.MB);
        //输出目录
        //集群模式
        FileNameFormat fileNameFormat = new DefaultFileNameFormat().withPath(args[0]);
        //本地运行
//        FileNameFormat fileNameFormat = new DefaultFileNameFormat().withPath("/stormJob");
        //创建bolt
        HdfsBolt hdfsBolt = new HdfsBolt()
                .withFsUrl("hdfs://mini1:9000")
                .withFileNameFormat(fileNameFormat)
                .withRecordFormat(format)
                .withRotationPolicy(rotationPolicy)
                .withSyncPolicy(policy);


        TopologyBuilder topologyBuilder = new TopologyBuilder();
        topologyBuilder.setSpout("kafkaSpout",new KafkaSpout(new SpoutConfig(
                new ZkHosts("mini1:2181,mini2:2181,mini3:2181","/kafka/brokers"),"stormJob","/kafka","stormJob")));
        topologyBuilder.setBolt("bolt1",new StormJobBolt1()).shuffleGrouping("kafkaSpout");
        topologyBuilder.setBolt("bolt2",new StormJobBolt2()).shuffleGrouping("bolt1");
        topologyBuilder.setBolt("hdfsBolt",hdfsBolt).shuffleGrouping("bolt2");

        Config config = new Config();

        StormTopology stormTopology = topologyBuilder.createTopology();

        //本地模式
//        LocalCluster localCluster = new LocalCluster();
//        localCluster.submitTopology("stormJob",config,stormTopology);
        //集群模式
        try{
            StormSubmitter.submitTopology("stormJob",config,stormTopology);
        }catch (Exception e){
            e.printStackTrace();
        }

spout使用kafkaSpout,bolt1和bolt2只是简单的业务逻辑,再此就不粘贴了。

参考:https://blog.csdn.net/u014039577/article/details/49818935
https://blog.csdn.net/u010003835/article/details/80172039

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值