文章目录
2021SC@SDUSC
storm-kafka源码分析(一)
storm-kafka是storm用于读取kafka消息的连接器。
一、org.apache.storm.kafka
org.apache.storm.kafka这个package包括了一些公共模块,以及storm-core的spout处理。
我们主要对其中的zk读写类进行分析。
(一)ZkState
ZkState记录了每个partition的处理情况,它是通过读写zk来实现的,zk中的内容如下:
{"topology":{"id":"2e3226e2-ef45-4c53-b03f-aacd94068bc9","name":"ljhtest"},"offset":8066973,"partition":0,"broker":{"host":"gdc-kafka08-log.i.nease.net","port":9092},"topic":"ma30"}
上面的信息分别为topoId,拓扑名称,这个分区处理到的offset,分区号,这个分区在哪台kafka机器,哪个端口,以及topic名称。
ZkState只要提供了对这个zk信息的读写操作,如readJSON, writeJSON。
这些信息在zk中的位置通过构建KafkaConfig对象时的第3、4个参数指定,如下面的配置,则数据被写在/kafka2/ljhtest下面。因此第4个参数必须唯一,否则不同拓扑会有冲突。
SpoutConfig kafkaConfig = new SpoutConfig(brokerHosts, "ma30", "/kafka2", "ljhtest");
(二)trident.ZkBrokerReader
trident.ZkBrokerReader大部分功能通过DynamicBrokersReader完成,关于与zk的连接,都是通过前者完成。同时增加了以下2个方法:
1、getBrokerForTopic():返回某个topic的分区信息,返回的是GlobalPartitionInformation对象。这是由于可能同时读取多个分区的情况。
2、getAllBrokers():读取所有的分区,不指定topic。因为支持正则topic,所以有可能有多个topic。
3、refresh(): 这是一个private方法,每隔一段时间去refresh分区信息,在上面2个方法中被调用。
public class ZkBrokerReader implements IBrokerReader {
public static final Logger LOG = LoggerFactory
.getLogger(ZkBrokerReader.class);
GlobalPartitionInformation cachedBrokers;
DynamicBrokersReader reader;
long lastRefreshTimeMs;
long refreshMillis;
/**
*
* @param conf
* @param topic
* 指定topic的zkBrokerReader
* @param hosts
*/
public ZkBrokerReader(Map conf, String topic, ZkHosts hosts) {
reader = new DynamicBrokersReader(conf, hosts.brokerZkStr,
hosts.brokerZkPath, topic);
cachedBrokers = reader.getBrokerInfo();
lastRefreshTimeMs = System.currentTimeMillis();
refreshMillis = hosts.refreshFreqSecs * 1000L;
}
@Override
public GlobalPartitionInformation getCurrentBrokers() {
long currTime = System.currentTimeMillis();
// 很简单, 指定了你多长时间开始去刷新Brokerlibiao
if (currTime > lastRefreshTimeMs + refreshMillis) {
LOG.info("brokers need refreshing because " + refreshMillis
+ "ms have expired");
cachedBrokers = reader.getBrokerInfo();
lastRefreshTimeMs = currTime;
}
return cachedBrokers;
}
@Override
public void close() {
reader.close();
}
}
(三)ZkCoordinator
该类实现了PartitionCoordinator接口,PartitionCoordinator只有3个方法:
- 主要方法为getMyManagedPartitions(),即计算自己这个spout应该处理哪些分区。
List<PartitionManager> getMyManagedPartitions();
- 获取PartitionManager对象:
PartitionManager getManager(Partition partition);
- 定期刷新分区信息
void refresh();
(四)StaticPartitionCoordinator
IPartitionCoordinator接口中有两个方法:
List<IPartitionManager> getMyPartitionManagers();
IPartitionManager getPartitionManager(String partitionId);
StaticPartitionCoordinator实现了IPartitionCoordinator接口。
public class StaticPartitionCoordinator implements IPartitionCoordinator {
private static final Logger logger = LoggerFactory.getLogger(StaticPartitionCoordinator.class);
protected final EventHubSpoutConfig config;
protected final int taskIndex;
protected final int totalTasks;
protected final List<IPartitionManager> partitionManagers;
protected final Map<String, IPartitionManager> partitionManagerMap;
protected final IStateStore stateStore;
public StaticPartitionCoordinator(
EventHubSpoutConfig spoutConfig,
int taskIndex,
int totalTasks,
IStateStore stateStore,
IPartitionManagerFactory pmFactory,
IEventHubReceiverFactory recvFactory) {
this.config = spoutConfig;
this.taskIndex = taskIndex;
this.totalTasks = totalTasks;
this.stateStore = stateStore;
List<String> partitionIds = calculateParititionIdsToOwn();
partitionManagerMap = new HashMap<String, IPartitionManager>();
partitionManagers = new ArrayList<IPartitionManager>();
for (String partitionId : partitionIds) {
IEventHubReceiver receiver = recvFactory.create(config, partitionId);
IPartitionManager partitionManager = pmFactory.create(
config, partitionId, stateStore, receiver);
partitionManagerMap.put(partitionId, partitionManager);
partitionManagers.add(partitionManager);
}
}
@Override
public List<IPartitionManager> getMyPartitionManagers() {
return partitionManagers;
}
@Override
public IPartitionManager getPartitionManager(String partitionId) {
return partitionManagerMap.get(partitionId);
}
protected List<String> calculateParititionIdsToOwn() {
List<String> taskPartitions = new ArrayList<String>();
for (int i = this.taskIndex; i < config.getPartitionCount(); i += this.totalTasks) {
taskPartitions.add(Integer.toString(i));
logger.info(String.format("taskIndex %d owns partitionId %d.", this.taskIndex, i));
}
return taskPartitions;
}
}
参考链接:
https://blog.csdn.net/jediael_lu/article/details/77149540