kafka-0.8-例子

最新推荐文章于 2020-10-20 23:03:00 发布

持人持己

最新推荐文章于 2020-10-20 23:03:00 发布

阅读量3.6k

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/ice00mouse/article/details/43836683

版权

hadoop 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

原版文章来自
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

Producers Example

1、一些知识

Producer类用来创建消息发送到特定Topic和可选的Partition
需要引入的类的

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

首先我们需要定义properties告诉Producer如何找到cluster、序列化消息、如果需要指定信息到特定的Partiton。

Properties props = new Properties();
props.put("metadata.broker.list", "broker1:9092,broker2:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class", "example.producer.SimplePartitioner");
props.put("request.required.acks", "1");

ProducerConfig config = new ProducerConfig(props);

1.1 下面我们来说明一下这几个参数

metadata.broker.list: 定义了去哪里找1个或多个Broker来决定每个Topic的Leader。并不需要填写你的cluster中的所有Broker，但是最好至少写2个来保证可用性。不用担心你填写的broker里没有leader，Producer会知道如何连接broker询问元数组，再连接到正确的broker
serializer.class: 定义了如何序列化将到发送到broker的信息。在我们的例子里，我们使用简单的字符串编码器。注意，我们使用的encoder必须可以接受KeyedMessage对象。
key.serializer.class: 我们可以特别指定Key的序列化方式。在默认情况下，和serializer.class方式一致
partitioner.class: 定义了用来消息分发到Partition的实现类。这是可选的，也是非常有用的。如果不指定，Producer将会随机指派Partition。同时还要注意，如果我们指定了partition.class，但在KeyedMessage中没有赋予key，也是会随机派送的
request.required.acks: 告诉kafka我们需要Producer得到Broker的信息接收确认。如果不设置这个参数，Producer发送并遗弃，也许会有信息遗失。

1.2 代码片段解释

/**
*   我们在定义Producer时，需要告诉它两个参数类型。第一个是Partition Key的类型；第二个是Message的类型。在这个例子里，两个都是String
*/
Producer<String, String> producer = new Producer<String, String>(config);

KeyedMessage<String, String> data = new KeyedMessage<String, String>("myTopic", "partition_key", message);

2、下面是完整例子

import java.util.*;

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

public class TestProducer {
    public static void main(String[] args) {
        long events = Long.parseLong(args[0]);
        Random rnd = new Random();

        Properties props = new Properties();
        props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
        props.put("serializer.class", "kafka.serializer.StringEncoder");
        props.put("partitioner.class", "example.producer.SimplePartitioner");
        props.put("request.required.acks", "1");

        ProducerConfig config = new ProducerConfig(props);

        Producer<String, String> producer = new Producer<String, String>(config);

        for (long nEvents = 0; nEvents < events; nEvents++) { 
               long runtime = new Date().getTime();  
               String ip = “192.168.2.” + rnd.nextInt(255); 
               String msg = runtime + “,www.example.com,” + ip; 
               KeyedMessage<String, String> data = new KeyedMessage<String, String>("page_visits", ip, msg);
               producer.send(data);
        }
        producer.close();
    }
}

Partitioning Code:

import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;

public class SimplePartitioner implements Partitioner {
    public SimplePartitioner (VerifiableProperties props) {}
    //根据IP最后一位求余分派partion
    public int partition(Object key, int a_numPartitions) {
        int partition = 0;
        String stringKey = (String) key;
        int offset = stringKey.lastIndexOf('.');
        if (offset > 0) {
           partition = Integer.parseInt( stringKey.substring(offset+1)) % a_numPartitions;
        }
        return partition;
    }
 }

3、运行

创建Topic:

bin/kafka-create-topic.sh –topic page_visits –replica 1 –zookeeper localhost:2181 –partition 2
运行我们写的JAVA程序

java TestProducer 100
查看我们写的内容

bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic page_visits –from-beginning

这里要特别说明一点，如果你使用kafka默认配置文件，并且运行Producer与kafka不在同一机器，这时需要修改config/server.properties方件的host.name参数。因为默认只接收本地请求。比如改成host.name=192.168.8.240本机ip

SimpleConsumer Example

生产者说完了，我们来看消费者。上面的例子我们使用了kafka自带的消费都来观看发送的消息，现在我们自己编写来实现特定的业务

1、为什么使用SimpleConsumer

要知道除了SimpleConsumer，Consumer这个类也可以用作消费者。我们之所以使用SimpleConsumer，是因为可以更好的在partion上控制consumption。

1.1 可以实现

多次读取同一条message
只消费Topic下的一个partion子集中的message
管理事务，确保一条message被消费一次，并仅此一次。

1.2 使用SimpleConsumer的缺点

SimpleConsumer需要大量的工作，不需要在Consumer Groups中。

必需在应用里追踪offsets，来标记消费到哪里了。
必需指出哪个Broker是topic的lead Broker
必需处理Broker leader改变情况

1.3 使用步骤

找到活动的Broker，并找出Topic和partition的leader Broker
Determine who the replica Brokers are for your topic and partition
生成你感兴趣的消息的定义
抓取消息
leader Broker改动时，识别并恢复

2、代码

package com.test.simple;

import kafka.api.FetchRequest;
import kafka.api.FetchRequestBuilder;
import kafka.api.PartitionOffsetRequestInfo;
import kafka.common.ErrorMapping;
import kafka.common.TopicAndPartition;
import kafka.javaapi.*;
import kafka.javaapi.consumer.SimpleConsumer;
import kafka.message.MessageAndOffset;

import java.nio.ByteBuffer;
import java.util.*;

public class SimpleConsumerClientDemo {
    public static void main(String args[]) {
        SimpleConsumerClientDemo example = new SimpleConsumerClientDemo();
        //最大测试读取信息数
        long maxReads = 1000000;
        //消费主题
        String topic = "test_topic";
        //partition
        int[] partitions = {0, 1};
        //broker list
        List<String> seeds = new ArrayList<String>();
        seeds.add("192.168.8.240");
        //本地端口
        int port = 9092;
        try {
            example.run(maxReads, topic, partitions, seeds, port);
        } catch (Exception e) {
            System.out.println("Oops:" + e);
            e.printStackTrace();
        }
    }

    private List<String> m_replicaBrokers = new ArrayList<String>();

    public SimpleConsumerClientDemo() {
        m_replicaBrokers = new ArrayList<String>();
    }

    public void run(long a_maxReads, String topic, int[] partitions, List<String> a_seedBrokers, int a_port) throws Exception {
        /**
         * 找到leader broker
         */
        PartitionMetadata metadata = findLeader(a_seedBrokers, a_port, topic, partitions[0]);
        if (metadata == null) {
            System.out.println("Can't find metadata for Topic and Partition. Exiting");
            return;
        }
        if (metadata.leader() == null) {
            System.out.println("Can't find Leader for Topic and Partition. Exiting");
            return;
        }
        String leadBrokerHost = metadata.leader().host();
        int leadBrokerPort = metadata.leader().port();
        String clientName = "client-"+topic;
        System.out.println("leader broker host: " + leadBrokerHost +", port: " + leadBrokerPort);
        /**
         * 创建SimpleConsumer
         */
        SimpleConsumer consumer = new SimpleConsumer(leadBrokerHost, leadBrokerPort, 100000, 64 * 1024, clientName);
        /**
         * 读取不同partition的开始offsets，这里设置从头开始。
         */
        long readOffsets[] = new long[partitions.length];
        for(int i=0; i<readOffsets.length; i++) {
            readOffsets[i] = getLastOffset(consumer, topic, partitions[i], kafka.api.OffsetRequest.EarliestTime(), clientName);
        }

        int numErrors = 0;
        while (a_maxReads > 0) {
            if (consumer == null) {
                consumer = new SimpleConsumer(leadBrokerHost, leadBrokerPort, 100000, 64 * 1024, clientName);
            }
            /**
             * 从不同partition循环读取信息
             */
            long numRead = 0;
            for(int i=0; i<partitions.length; i++) {
                FetchRequest req = new FetchRequestBuilder()
                        .clientId(clientName)
                        .addFetch(topic, partitions[i], readOffsets[i], 100000) // Note: this fetchSize of 100000 might need to be increased if large batches are written to Kafka
                        .build();
                FetchResponse fetchResponse = consumer.fetch(req);

                if (fetchResponse.hasError()) {
                    numErrors++;
                    // Something went wrong!
                    short code = fetchResponse.errorCode(topic, partitions[i]);
                    System.out.println("Error fetching data from the Broker:" + leadBrokerHost + " Reason: " + code);
                    if (numErrors > 5) break;
                    if (code == ErrorMapping.OffsetOutOfRangeCode()) {
                        // We asked for an invalid offset. For simple case ask for the last element to reset
                        readOffsets[i] = getLastOffset(consumer, topic, partitions[i], kafka.api.OffsetRequest.LatestTime(), clientName);
                        continue;
                    }
                    consumer.close();
                    consumer = null;
                    leadBrokerHost = findNewLeader(leadBrokerHost, topic, partitions[i], leadBrokerPort);
                    continue;
                }
                numErrors = 0;

                for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(topic, partitions[i])) {
                    long currentOffset = messageAndOffset.offset();
                    if (currentOffset < readOffsets[i]) {
                        System.out.println("Found an old offset: " + currentOffset + " Expecting: " + readOffsets[i]);
                        continue;
                    }
                    readOffsets[i] = messageAndOffset.nextOffset();
                    ByteBuffer payload = messageAndOffset.message().payload();

                    byte[] bytes = new byte[payload.limit()];
                    payload.get(bytes);
                    System.out.println(partitions[i]+":"+String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));
                    numRead++;
                    a_maxReads--;
                }
            }

            if (numRead == 0) {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException ie) {
                }
            }
        }
        if (consumer != null) consumer.close();
    }

    public static long getLastOffset(SimpleConsumer consumer, String topic, int partition,
                                     long whichTime, String clientName) {
        TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition);
        Map<TopicAndPartition, PartitionOffsetRequestInfo> requestInfo = new HashMap<TopicAndPartition, PartitionOffsetRequestInfo>();
        requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 2));
        kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(
                requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName);
        OffsetResponse response = consumer.getOffsetsBefore(request);

        if (response.hasError()) {
            System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition));
            return 0;
        }
        long[] offsets = response.offsets(topic, partition);
        return offsets[0];
    }

    private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception {
        for (int i = 0; i < 3; i++) {
            boolean goToSleep = false;
            PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition);
            if (metadata == null) {
                goToSleep = true;
            } else if (metadata.leader() == null) {
                goToSleep = true;
            } else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) {
                // first time through if the leader hasn't changed give ZooKeeper a second to recover
                // second time, assume the broker did recover before failover, or it was a non-Broker issue
                //
                goToSleep = true;
            } else {
                return metadata.leader().host();
            }
            if (goToSleep) {
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException ie) {
                }
            }
        }
        System.out.println("Unable to find new leader after Broker failure. Exiting");
        throw new Exception("Unable to find new leader after Broker failure. Exiting");
    }

    /**
     * 有个疑问，不同partition会在不同的leader broker上么？
     */
    private PartitionMetadata findLeader(List<String> a_seedBrokers, int a_port, String a_topic, int a_partition) {
        PartitionMetadata returnMetaData = null;
        loop:
        for (String seed : a_seedBrokers) {
            SimpleConsumer consumer = null;
            try {
                consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup");
                List<String> topics = Collections.singletonList(a_topic);
                TopicMetadataRequest req = new TopicMetadataRequest(topics);
                kafka.javaapi.TopicMetadataResponse resp = consumer.send(req);

                List<TopicMetadata> metaData = resp.topicsMetadata();
                for (TopicMetadata item : metaData) {
                    System.out.println(item);
                    for (PartitionMetadata part : item.partitionsMetadata()) {
                        if (part.partitionId() == a_partition) {
                            returnMetaData = part;
                            break loop;
                        }
                    }
                }
            } catch (Exception e) {
                System.out.println("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic
                        + ", " + a_partition + "] Reason: " + e);
            } finally {
                if (consumer != null) consumer.close();
            }
        }
        if (returnMetaData != null) {
            m_replicaBrokers.clear();
            for (kafka.cluster.Broker replica : returnMetaData.replicas()) {
                m_replicaBrokers.add(replica.host());
            }
        }
        return returnMetaData;
    }
}

3、Kafka Consumer接口

kafka的consumer接口，提供了两种版本，

high_level

一种high-level版本，比较简单不用关心offset, 会自动的读zookeeper中该Consumer group的last offset
1. 如果consumer比partition多，是浪费，因为kafka的设计是在一个partition上是不允许并发的，所以consumer数不要大于partition数
2. 如果consumer比partition少，一个consumer会对应于多个partitions，这里主要合理分配consumer数和partition数，否则会导致partition里面的数据被取的不均匀

最好partiton数目是consumer数目的整数倍，所以partition数目很重要，比如取24，就很容易设定consumer数目
3. 如果consumer从多个partition读到数据，不保证数据间的顺序性，kafka只保证在一个partition上数据是有序的，但多个partition，根据你读的顺序会有不同
4. 增减consumer，broker，partition会导致rebalance，所以rebalance后consumer对应的partition会发生变化
5. High-level接口中获取不到数据的时候是会block的

SimpleConsumer

另一种就是SimpleConsumer了

持人持己

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
kafka-0.8-例子

原版文章来自 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+ExampleProducers Example1、一些知识Producer类用来创建消息发送到
复制链接

扫一扫