flume接受数据流并sink到aws S3

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/fct2001140269/article/details/95458652

flume接受数据流并sink到aws S3

一、问题描述

场景问题:flume接受http中数据,并将文件名称写入flume的event中的header中(header存储时候是<key1,value1>形式),需要flume接受source端的数据流,并解析其中的相应字段之后,sink到S3中。

提示:解析出header中的filename字段要作为文件名称,并保存event消息对象为小文件。

二、思路与过程

1.根据场景问题,配置flume中的启动的conf-file文件fang-custom-sink2s3.conf

csuHttp.sources = r1
csuHttp.channels = c1
csuHttp.sinks = s1


# =============  Configure the source  ==========
csuHttp.sources.r1.type = http
csuHttp.sources.r1.port = xxxx
csuHttp.sources.r1.bind = 172.xx.xx.xxx
csuHttp.sources.r1.handler = com.navinfo.flume.source.http.CSUBlobHandler


# ==========  Configure the sink  ==========
csuHttp.sinks.s1.type = flume2s3.MySinks2S3
csuHttp.sinks.s1.hdfs.rollSize = 0
csuHttp.sinks.s1.hdfs.rollCount = 1

#============= my defined var :for aws s3 ==================
csuHttp.sinks.s1.aws_access_key = my_aws_access_key
csuHttp.sinks.s1.aws_secret_key = my_aws_secret_key
csuHttp.sinks.s1.my_bucket_path = public-sss-cn-ss-sss-cc-s3
csuHttp.sinks.s1.my_endpoint = s3.cn-northwest-1.amazonaws.com.cn
csuHttp.sinks.s1.my_region = cn-northwest-1

#============= my defined var :for local s3 ==================
#csuHttp.sinks.s1.aws_access_key = xxxxxxxxxxxxxxxxxx
#csuHttp.sinks.s1.aws_secret_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
#csuHttp.sinks.s1.my_bucket_path = mybucket_path/upload
##csuHttp.sinks.s1.my_region = 可空置
#csuHttp.sinks.s1.my_endpoint = http://192.168.xx.xxx:xxxx



# ==========  define channel from kafka source to hdfs sink  ==========
csuHttp.channels.c1.type = memory
# channel store size
csuHttp.channels.c1.capacity = 1000
# transaction size
csuHttp.channels.c1.transactionCapacity = 1000
csuHttp.channels.c1.byteCapacity = 838860800000
csuHttp.echannels.c1.byteCapacityBufferPercentage = 20
csuHttp.echannels.c1.keep-alive = 60

# ==========  Bind the sources and sinks to the channels  ==========
csuHttp.sources.r1.channels = c1
csuHttp.sinks.s1.channel = c1

2.此处需要flume自定义sink,具体的sink代码实现如下:

java代码实现:

package flume2s3;

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PutObjectRequest;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.ByteArrayInputStream;
import java.util.Date;
import java.util.Map;


/**
 * Created by user on 2019/7/9.
 */
public class MySinks2S3 extends AbstractSink implements Configurable{

    private static String aws_access_key = null; // 【你的 access_key】
    private static String aws_secret_key = null; // 【你的 aws_secret_key】
    private static String my_bucket_path = null; // 【你 bucket 的名字】 # 首先需要保证 s3 上已经存在该存储桶
    private static String my_region = null; //aws的region区域
    private static String my_endpoint = null;

    private AmazonS3 s3Client;

    @Override
    public synchronized void start() {//自定义sink资源的初始化工作
        super.start();
        initResourceConn();
        System.out.println("[fct] start(): initResourceConn finished!! ");
    }

    @Override
    public void configure(Context context) {
        aws_access_key = context.getString("aws_access_key", null);
        aws_secret_key = context.getString("aws_secret_key", null);
        my_bucket_path = context.getString("my_bucket_path", null);
        my_region = context.getString("my_region",null);
        my_endpoint = context.getString("my_endpoint",null);
        System.out.println(" [ read flume configure] ----  aws_access_key: "+aws_access_key +"; aws_secret_key: "+aws_secret_key+ " ;my_bucket_path: "+my_bucket_path +"; my_endpoint:"+my_endpoint+"; my_region:"+my_region);

    }


    public void initResourceConn() {
        //静态块:初始化S3的连接对象s3Client! 需要3个参数:AWS_ACCESS_KEY,AWS_SECRET_KEY,AWS_REGION
        AWSCredentials awsCredentials = new BasicAWSCredentials(aws_access_key, aws_secret_key);
        //这是一个构建者模式,通过不停地来追加各种参数;spark和flink中有很常见
        //注意:因为是本地方式,访问相应的S3文件系统,所以signingRegion可以默认为空。
        s3Client = AmazonS3ClientBuilder.standard()
                .withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
                .withEndpointConfiguration(
                        new AwsClientBuilder.EndpointConfiguration(my_endpoint,my_region))
                .build();

        //测试是否连接上去S3
        System.out.println("[fct] ||| [list all buckets]: " + s3Client.listBuckets()+"\n");

    }


    private static final Logger logger = LoggerFactory.getLogger(MySinks2S3.class);


    @Override
    public Status process() throws EventDeliveryException {
        // TODO Auto-generated method stub
        Status result = Status.READY;
        Channel channel = getChannel();
        Transaction transaction = channel.getTransaction();
        Event event =null;

        try {
            transaction.begin();
            event = channel.take();

            if (event != null) {
                byte[] eventBodyBytes = event.getBody();
                Map<String, String> eventHeader = event.getHeaders();

                String fileName = "defaultName";
                if (eventHeader != null ) {
                    if (eventHeader.containsKey("filename")) {
                        fileName = eventHeader.get("filename");
                    } else {
//                        System.out.println("【Warn】msg not containsKey [filename] !!!");
                        logger.warn("[fct Warn] msg not containsKey [filename] !!!");
                    }
                } else {
                    logger.warn("[fct Warn] eventHeader is null");
//                    System.out.println("【Warn】 eventHeader is null" );
                }

                ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(eventBodyBytes);
                ObjectMetadata objectMetadata = new ObjectMetadata();
                objectMetadata.setContentType("application/x-protobuf");
                objectMetadata.setLastModified(new Date());
                objectMetadata.setContentLength(eventBodyBytes.length);

                PutObjectRequest putObjectRequest = new PutObjectRequest(my_bucket_path, fileName,byteArrayInputStream,objectMetadata  );
                s3Client.putObject(putObjectRequest);
//                s3Client.putObject(bucketName, fileName, eventBody);
//                System.out.println("put byte sucessfully!! ---fileName: " + fileName);
                logger.info("put byte sucessfully!! ---fileName: " + fileName);
            } else {
                // No event found, request back-off semantics from the sink runner
                result = Status.BACKOFF;
            }

            transaction.commit();
//            System.out.println("transaction commit...");

        } catch (Exception ex) {
            transaction.rollback();
            System.out.println("transaction rollback...");
            String errorMsg = "Failed to publish event: " + event;
            logger.error(errorMsg);
            System.out.println("exception msg:" + ex.getMessage());
            throw new EventDeliveryException(errorMsg, ex);
        } finally {
            transaction.close();
//            System.out.println("transaction close...");
        }
        return result;


    }


    @Override
    public synchronized void stop() {//结束时候的资源清理工作
        super.stop();
        s3Client.shutdown();
        System.out.println("fct|| stop(): s3Client.shutdown();");
    }
}

打包该java程序为jar包FangFlume2S3-1.0-SNAPSHOT.jar,然后把该jar包放入指定的flume类路径的lib目录下。

三、最后启动相应的flume配置文件即可。

##在flume安装程序下,启动相应的flume配置文件即可!
./bin/flume-ng agent --conf ./conf -f  ./conf/fang-custom-sink2s3_aws3.conf -n csuHttp -Dflume.root.logger=INFO,console

四、常见错误警告

使用本地向S3中上传文件的时候,采用不同的aws,导致版本不匹配问题

        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-s3</artifactId>
            <version>1.11.347</version>
        </dependency>
        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk</artifactId>
            <version>1.7.4</version>
        </dependency>

出现如下错误:

java.lang.NoSuchFieldError: ALLOW_FINAL_FIELDS_AS_MUTATORS
	at com.amazonaws.partitions.PartitionsLoader.<clinit>(PartitionsLoader.java:52)
	at com.amazonaws.regions.RegionMetadataFactory.create(RegionMetadataFactory.java:30)
	at com.amazonaws.regions.RegionUtils.initialize(RegionUtils.java:64)
	at com.amazonaws.regions.RegionUtils.getRegionMetadata(RegionUtils.java:52)
	at com.amazonaws.regions.RegionUtils.getRegion(RegionUtils.java:105)
	at com.amazonaws.client.builder.AwsClientBuilder.getRegionObject(AwsClientBuilder.java:249)
	at com.amazonaws.client.builder.AwsClientBuilder.withRegion(AwsClientBuilder.java:238)
	at mys3test.UploadTest.<clinit>(UploadTest.java:34)
Exception in thread "main" 
Process finished with exit code 1

问题分析:

没有响应的字段,查看源码发现,如下错误,

C:\Users\user\AppData\Roaming\Typora\typora-user-images\1562808510902.png

参见文章:

https://www.e-learn.cn/content/wangluowenzhang/488554

Could not initialize class com.amazonaws.partitions.PartitionsLoader

https://stackoverflow.com/questions/46926690/java-lang-nosuchfielderror-allow-final-fields-as-mutators-when-i-download-file

[java.lang.NoSuchFieldError: ALLOW_FINAL_FIELDS_AS_MUTATORS when I download file from AmazonS3 [duplicate]]

初步确定是版本不匹配,而且是aws先关的版本中Jackson library.版本不正确,后期检查,发现是aws-java-sdk的maven包引入的时候,版本过低(1.7.4)。此处改为与aws-java-sdk-s3包相同的版本号(1.11.347),或者删除该aws-java-sdk包,只是保留aws-java-sdk-s3(1.11.347)的包即可。

        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-s3</artifactId>
            <version>1.11.347</version>
        </dependency>
<!--        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk</artifactId>
            <version>1.7.4</version>
        </dependency>-->

展开阅读全文

没有更多推荐了,返回首页