构建大数据ETL通道--Json数据的流式转换--Json转Parquet(三)

如果生成的日志数据是Avro格式,可直接采用上一篇的方式( https://blog.csdn.net/qq_29829081/article/details/80518671),将Avro数据转储为Parquet。但是我们一般都是日志数据不是Avro,大部分是Json数据。因此,本篇主要讲如何将Json通过Morphline流式转储为Parquet数据。文章中只是简单的例子,在实际生产环境中,我们的Json数据非常复杂,但是也可以采用Morphline进行转储,可以采用通用的方式进行处理,下一节再详述。

本节主要讲述如何借助于flume的Morphline Interceptor,将json数据线临时转成avro,再通过kite dataset sink最终将数据转成parquet格式进行存储。其实最关键的就是Flume的配置,Morphline命令行的组合。

1 Flume配置:

(1) nginx端flume配置

# Name the components on this agent
a1.sources = r
a1.sinks = k_kafka
a1.channels = c_mem

# Channelsinfo
a1.channels.c_mem.type = memory
a1.channels.c_mem.capacity = 2000
a1.channels.c_mem.transactionCapacity = 300
a1.channels.c_mem.keep-alive = 60

# Sources info
a1.sources.r.type = exec
a1.sources.r.shell = /bin/bash -c
a1.sources.r.command = tail -F /home/litao/litao.json
a1.sources.r.channels = c_mem

# Sinks info
a1.sinks.k_kafka.channel  = c_mem
a1.sinks.k_kafka.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k_kafka.kafka.bootstrap.servers = kafka1:9093,kafka2:9093,kafka3:9093,kafka4:9093,kafka5:9093,kafka6:9093
a1.sinks.k_kafka.kafka.topic = test_2018-03-14
a1.sinks.k_kafka.kafka.flumeBatchSize = 5
a1.sinks.k_kafka.kafka.producer.acks =1

(2) kafka端flume配置
# Name the components on this agent
a1.channels = c1
a1.sources = r1
a1.sinks  = k1

# Channel config
a1.channels.c1.type = memory
a1.channels.c1.capacity = 500000
a1.channels.c1.transactionCapacity =100000
a1.channels.c1.keep-alive = 50

# Sources info
a1.sources.r1.type = com.bigo.flume.source.kafka.KafkaSource
a1.sources.r1.channels = c1
a1.sources.r1.kafka.bootstrap.servers = kafka1:9093,kafka2:9093,kafka3:9093,kafka4:9093,kafka5:9093,kafka6:9093
a1.sources.r1.kafka.topics = test_2018-03-14
a1.sources.r1.kafka.consumer.group.id = test_2018-03-14.conf_flume_group
a1.sources.r1.kafka.consumer.timeout.ms = 100
a1.sources.r1.batchSize = 2000

# Config Interceptors
a1.sources.r1.interceptors=i1 morphline

# Inject the Schema into the header so the AvroEventSerializer can pick it up
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = flume.avro.schema.url
#a1.sources.r1.interceptors.i1.value = file:/home/litao/litao.avsc
a1.sources.r1.interceptors.i1.value=hdfs://bigocluster/user/litao/litao.avsc

# Morphline interceptor config
a1.sources.r1.interceptors.morphline.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
a1.sources.r1.interceptors.morphline.morphlineFile = /etc/flume/conf/a1/morphline.conf
a1.sources.r1.interceptors.morphline.morphlineId = convertJsonToAvro

# Sink config
a1.sinks.k1.type  = org.apache.flume.sink.kite.DatasetSink
a1.sinks.k1.channel  = c1
a1.sinks.k1.kite.dataset.uri  = dataset:hdfs://bigocluster/flume/hellotalk/parquet
a1.sinks.k1.kite.batchSize = 100
a1.sinks.k1.kite.rollInterval = 30

2 morphlines配置:
                    morphlines: [
  {
    id: convertJsonToAvro
    importCommands: [ "org.kitesdk.**" ]
    commands: [
      # read the JSON blob
      { readJson: {} }
      # extract JSON objects into fields
      { extractJsonPaths {
        flatten: true
        paths: {
          name: /name
          age: /age
        }
      } }
      # add a creation timestamp to the record
      #{ addCurrentTime {
      #  field: timestamp
      #  preserveExisting: true
      #} }
      # convert the extracted fields to an avro object
      # described by the schema in this field
      { toAvro {
        schemaFile: /home/litao/litao.avsc
      } }
      # serialize the object as avro
      { writeAvroToByteArray: {
        format: containerlessBinary
      } }
    ]
  }
]
3 依赖的jar包:
config-1.3.1.jar
metrics-healthchecks-3.0.2.jar
kite-morphlines-core-1.1.0.jar
kite-morphlines-json-1.1.0.jar
kite-morphlines-avro-1.1.0.jar

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值