kafka常用命令&&flume和kafka整合&&用spark消费kafka中的数据实现wordcount&&将处理好的数据存到redis中

1,前台启动kafka:  ./kafka-server-start.sh ../config/server.properties

2,后台启动kafka:     ./kafka-server-start.sh ../config/server.properties 1>/dev/null 2>&1 &

3,查看当前服务器的所有的topic:    ./kafka-topics.sh --list --zookeeper  192.168.35.98:2181

4,创建topic:   ./kafka-topics.sh --create --zookeeper 192.168.35.98:2181 --replication-factor 1 --partitions 1 --topic goodStudy                                 创建一个分区,并且这个分区放到一个broker上.

5,删除topic:    ./kafka-topics.sh --delete --zookeeper 192.168.35.98:2181 --topic test 

                      需要server.properties中设置delete.topic.enable=true否则只是标记删除或者直接重启。

6,通过shell命令发送消息:./kafka-console-producer.sh --broker-list 192.168.35.98:9092 --topic goodStudy

7,通过shell命令消费消息: ./kafka-console-consumer.sh --zookeeper 192.168.35.98:2181 --from-beginning --topic goodStudy

8,查看消费位置:./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 192.168.35.98:2181 --group goodStudy

9,查看某个topic的详情: ./kafka-topics.sh --topic goodStudy --describe --zookeeper 192.168.35.98:2181

java生产者代码:

java消费者代码:

package cn.itcast.document;

import java.util.List;
import java.util.Map;
import java.util.Properties;

import scala.collection.immutable.HashMap;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerConnector;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;

public class CustomerKafka {

    private final static String TOPIC = "goodStudy";
    
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("zookeeper.connect", "192.168.35.98:2181,192.168.35.97:2181,192.168.35.96:2181");
        props.put("group.id", "goodStudy");
        props.put("zookeeper.session.timeout.ms", "4000");
        props.put("zookeeper.sync.time.ms", "200");
        props.put("auto.commit.interval.ms", "1000");
        
        kafka.javaapi.consumer.ConsumerConnector consumer = kafka.consumer.Consumer.createJavaConsumerConnector(new ConsumerConfig(props));
        Map<String,Integer> topicMap = new java.util.HashMap<String,Integer>();
        topicMap.put(TOPIC, new Integer(1));
        Map<String, List<KafkaStream<byte[], byte[]>>> streams = consumer.createMessageStreams(topicMap);
        KafkaStream<byte[],byte[]> kafkaStream = streams.get(CustomerKafka.TOPIC).get(0);
        ConsumerIterator<byte[],byte[]> iterator = kafkaStream.iterator();
        while (iterator.hasNext()) {
            System.out.println(iterator.next().message().toString());
        }
    }
}

 

二,与flume整合.

1,flume的配置文件:

tion file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

agent.sources = r1
agent.channels = c1
agent.sinks = s1

# For each one of the sources, the type is defined
#agent.sources.r1.type = spooldir
#agent.sources.r1.command = /home/hadoop/flumedata
#agent.sources.r1.fileHeader = true
#agent.sources.r1.channels = c1

agent.sources.r1.type = spooldir
agent.sources.r1.spoolDir = /home/hadoop/flumedata
agent.sources.r1.fileHeader = true

# Each sink's type must be defined

#agent.sinks.s1.type = logger

agent.sinks.s1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.s1.topic = goodStudy
agent.sinks.s1.brokerList = 192.168.35.98:9092
agent.sinks.s1.requiredAcks = 1
agent.sinks.s1.batchSize = 2


# Each channel's type is defined.
agent.channels.c1.type = memory
agent.channels.c1.capacity = 100
agent.sources.r1.channels = c1
agent.sinks.s1.channel = c1

2,flume的启动命令:

 ./flume-ng agent --conf conf -f ../conf/kafka-conf.properties -n agent -Dflume.root.logger=INFO,console 

启动之后,flume会自动监测/home/hadoop/flumedata目录,当有新文件进去,会读到kafka中,然后用java就能消费到这个信息

三,用spark消费flume监测到的文件导入到kafka中然后实现wordcount

代码如下:

package cn.itcast.sparkDay05

import cn.itcast.spark.day5.LoggerLevels
import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.HashPartitioner

object KafkaWordCount {
  
  val numThreads = "1"
  val topics = "goodStudy"
  val zkQuorum = "192.168.35.98:2181,192.168.35.97:2181,192.168.35.96:2181"
  val group = "goodStudy"
  
  
   val updateFunc = (iter: Iterator[(String, Seq[Int], Option[Int])]) => {
      //iter.flatMap(it=>Some(it._2.sum + it._3.getOrElse(0)).map(x=>(it._1,x)))
      iter.flatMap { case (x, y, z) => Some(y.sum + z.getOrElse(0)).map(i => (x, i)) }
    }
  
  def main(args: Array[String]): Unit = {
    LoggerLevels.setStreamingLogLevels()
   // val Array(zkQuorum, group, topics, numThreads) = args
    val conf = new SparkConf().setAppName("KafkaWordCount").setMaster("local[2]")
    val ssc = new StreamingContext(conf,Seconds(5))
    ssc.checkpoint("F:\\hadoop_doc\\spark\\ck")    
    val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
    val data = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap, StorageLevel.MEMORY_AND_DISK_SER)
    val words = data.map(_._2).flatMap(_.split(" "))
    val wordCounts = words.map((_, 1)).updateStateByKey(updateFunc, new HashPartitioner(ssc.sparkContext.defaultParallelism), true)
    wordCounts.print()
    ssc.start()
    ssc.awaitTermination()
  }
}

四,将处理好的数据存储到Redis中,单点redis的启动命令:redis-server redis.conf &

获取redis的工具类:

package cn.itcast.sparkDay05

import redis.clients.jedis.JedisPool
import org.apache.commons.pool2.impl.GenericObjectPoolConfig
import redis.clients.jedis.JedisPoolConfig

object RedisClient extends Serializable {  
  val redisHost = "192.168.35.98"  
  val redisPort = 6379  
  val redisTimeout = 30000  
  lazy val pool = new JedisPool(new JedisPoolConfig(), redisHost, redisPort, redisTimeout)  
  
  lazy val hook = new Thread {  
    override def run = {  
      println("Execute hook thread: " + this)  
      pool.destroy()  
    }  
  }  
  sys.addShutdownHook(hook.run)  
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值