1,前台启动kafka: ./kafka-server-start.sh ../config/server.properties
2,后台启动kafka: ./kafka-server-start.sh ../config/server.properties 1>/dev/null 2>&1 &
3,查看当前服务器的所有的topic: ./kafka-topics.sh --list --zookeeper 192.168.35.98:2181
4,创建topic: ./kafka-topics.sh --create --zookeeper 192.168.35.98:2181 --replication-factor 1 --partitions 1 --topic goodStudy 创建一个分区,并且这个分区放到一个broker上.
5,删除topic: ./kafka-topics.sh --delete --zookeeper 192.168.35.98:2181 --topic test
需要server.properties中设置delete.topic.enable=true否则只是标记删除或者直接重启。
6,通过shell命令发送消息:./kafka-console-producer.sh --broker-list 192.168.35.98:9092 --topic goodStudy
7,通过shell命令消费消息: ./kafka-console-consumer.sh --zookeeper 192.168.35.98:2181 --from-beginning --topic goodStudy
8,查看消费位置:./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper 192.168.35.98:2181 --group goodStudy
9,查看某个topic的详情: ./kafka-topics.sh --topic goodStudy --describe --zookeeper 192.168.35.98:2181
java生产者代码:
![](https://i-blog.csdnimg.cn/blog_migrate/0b0d1c8ede3298bd077f79535e134a24.png)
java消费者代码:
package cn.itcast.document; import java.util.List; import java.util.Map; import java.util.Properties; import scala.collection.immutable.HashMap; import kafka.consumer.ConsumerConfig; import kafka.consumer.ConsumerConnector; import kafka.consumer.ConsumerIterator; import kafka.consumer.KafkaStream; public class CustomerKafka { private final static String TOPIC = "goodStudy"; public static void main(String[] args) { Properties props = new Properties(); props.put("zookeeper.connect", "192.168.35.98:2181,192.168.35.97:2181,192.168.35.96:2181"); props.put("group.id", "goodStudy"); props.put("zookeeper.session.timeout.ms", "4000"); props.put("zookeeper.sync.time.ms", "200"); props.put("auto.commit.interval.ms", "1000"); kafka.javaapi.consumer.ConsumerConnector consumer = kafka.consumer.Consumer.createJavaConsumerConnector(new ConsumerConfig(props)); Map<String,Integer> topicMap = new java.util.HashMap<String,Integer>(); topicMap.put(TOPIC, new Integer(1)); Map<String, List<KafkaStream<byte[], byte[]>>> streams = consumer.createMessageStreams(topicMap); KafkaStream<byte[],byte[]> kafkaStream = streams.get(CustomerKafka.TOPIC).get(0); ConsumerIterator<byte[],byte[]> iterator = kafkaStream.iterator(); while (iterator.hasNext()) { System.out.println(iterator.next().message().toString()); } } } |
二,与flume整合.
1,flume的配置文件:
tion file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called 'agent' agent.sources = r1 agent.channels = c1 agent.sinks = s1 # For each one of the sources, the type is defined #agent.sources.r1.type = spooldir #agent.sources.r1.command = /home/hadoop/flumedata #agent.sources.r1.fileHeader = true #agent.sources.r1.channels = c1 agent.sources.r1.type = spooldir agent.sources.r1.spoolDir = /home/hadoop/flumedata agent.sources.r1.fileHeader = true # Each sink's type must be defined #agent.sinks.s1.type = logger agent.sinks.s1.type = org.apache.flume.sink.kafka.KafkaSink agent.sinks.s1.topic = goodStudy agent.sinks.s1.brokerList = 192.168.35.98:9092 agent.sinks.s1.requiredAcks = 1 agent.sinks.s1.batchSize = 2 # Each channel's type is defined. agent.channels.c1.type = memory agent.channels.c1.capacity = 100 agent.sources.r1.channels = c1 agent.sinks.s1.channel = c1 |
2,flume的启动命令:
./flume-ng agent --conf conf -f ../conf/kafka-conf.properties -n agent -Dflume.root.logger=INFO,console |
启动之后,flume会自动监测/home/hadoop/flumedata目录,当有新文件进去,会读到kafka中,然后用java就能消费到这个信息
三,用spark消费flume监测到的文件导入到kafka中然后实现wordcount
代码如下:
![](https://i-blog.csdnimg.cn/blog_migrate/b6c144cd13c4a6563f7704910fe9764a.png)
package cn.itcast.sparkDay05 import cn.itcast.spark.day5.LoggerLevels import org.apache.spark.SparkConf import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.Seconds import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.kafka.KafkaUtils import org.apache.spark.HashPartitioner object KafkaWordCount { val numThreads = "1" val topics = "goodStudy" val zkQuorum = "192.168.35.98:2181,192.168.35.97:2181,192.168.35.96:2181" val group = "goodStudy" val updateFunc = (iter: Iterator[(String, Seq[Int], Option[Int])]) => { //iter.flatMap(it=>Some(it._2.sum + it._3.getOrElse(0)).map(x=>(it._1,x))) iter.flatMap { case (x, y, z) => Some(y.sum + z.getOrElse(0)).map(i => (x, i)) } } def main(args: Array[String]): Unit = { LoggerLevels.setStreamingLogLevels() // val Array(zkQuorum, group, topics, numThreads) = args val conf = new SparkConf().setAppName("KafkaWordCount").setMaster("local[2]") val ssc = new StreamingContext(conf,Seconds(5)) ssc.checkpoint("F:\\hadoop_doc\\spark\\ck") val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap val data = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap, StorageLevel.MEMORY_AND_DISK_SER) val words = data.map(_._2).flatMap(_.split(" ")) val wordCounts = words.map((_, 1)).updateStateByKey(updateFunc, new HashPartitioner(ssc.sparkContext.defaultParallelism), true) wordCounts.print() ssc.start() ssc.awaitTermination() } } |
四,将处理好的数据存储到Redis中,单点redis的启动命令:redis-server redis.conf &
获取redis的工具类:
package cn.itcast.sparkDay05 import redis.clients.jedis.JedisPool import org.apache.commons.pool2.impl.GenericObjectPoolConfig import redis.clients.jedis.JedisPoolConfig object RedisClient extends Serializable { val redisHost = "192.168.35.98" val redisPort = 6379 val redisTimeout = 30000 lazy val pool = new JedisPool(new JedisPoolConfig(), redisHost, redisPort, redisTimeout) lazy val hook = new Thread { override def run = { println("Execute hook thread: " + this) pool.destroy() } } sys.addShutdownHook(hook.run) } |
![](https://i-blog.csdnimg.cn/blog_migrate/29b49c93d6a663290e7da103cefd7a59.png)
![](https://i-blog.csdnimg.cn/blog_migrate/45ce289a84bd722432bdbb1856cd8d49.png)
![](https://i-blog.csdnimg.cn/blog_migrate/dd86937f41a534b0a7ea6b321ad14f03.png)