Apache Druid集群部署

最新推荐文章于 2024-01-23 16:15:11 发布

仰望夜空一万次

最新推荐文章于 2024-01-23 16:15:11 发布

阅读量2.7k

点赞数 1

分类专栏：数据库文章标签：时序数据库 druid

本文链接：https://blog.csdn.net/forrest420/article/details/102569747

版权

数据库专栏收录该内容

9 篇文章 1 订阅

订阅专栏

欢迎关注今日头条号、微信公众号、知乎号：仰望夜空一万次

随意聊聊并记录从小城市到上海工作生活的所思所想。

不去记录，有些事情都好像没有发生过。

本文简单介绍Apache Druid的集群部署方式

这个简单的集群的特征为：

1台服务器部署Coordinator 和Overlord 进程，作为Master角色
2台可扩展、容错的的服务器部署Historical和MiddleManager进程，作为Data角色
1台服务器部署Broker和Router进程，作为Query角色

在生产环境，推荐署多个Master和多个Query服务来满足容错。但现在可以快速的使用一个Master、一个Query服务器的方式先完成集群部署，后续添加Master、Query服务器。

机器角色划分

假设我们有四台机器，以官网推荐最简单的模式分配组件角色。后期可以对各个角色进行水平扩展。

服务器名称	部署角色类型	实际部署组件
severMaster	Master	Coordinator、Overloads
severQuery	Query	Router、Broker
severData1	Data	Middle Manager、Historical
severData2	Data	Middle Manager、Historical

服务器角色示意图

依赖组件

apache-druid-0.16.0-incubating-bin.tar.gz
Java
```
版本要求：Java 8 (8u92+)
```
Mysql
```
用于存储MetaData信息
```

安装步骤

安装Mysql

使用Docker安装Mysql

#下载Mysql的Docker镜像
docker pull mysql:5.7.22
#启动，机器3309对docker内部映射3306
docker run --name mysql-docker -v /data/apps/mysqldata:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=xxx -p 3309:3306 -d mysql:5.7.22
将mysql驱动拷贝到druid扩展目录
cp mysql-connector-java-5.1.38.jar /opt/druid/extensions/mysql-metadata-storage

执行命令，初始化数据库
#进入docker mysql命令行
docker exec -it mysql-docker bash
#使用root登陆
mysql -uroot -p
#创建数据库，赋权
CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4;
CREATE USER 'druid'@'%' IDENTIFIED BY 'druid';
GRANT ALL PRIVILEGES ON druid.* TO 'druid'@'%';

初始化HDFS的druid目录

为了使用HDFS作为Deep Storage，需要用到HDFS

#Hadoop目录文件创建、hdfs用户
hadoop fs -mkdir -p /druid/segments
hadoop fs  -mkdir -p /druid/indexing-logs
hadoop fs  -mkdir -p /tmp/druid-indexing
hadoop fs -chmod -R 777 /druid
hadoop fs -chmod -R 777 /tmp/druid-indexing

创建软链接

并将hadoop的配置文件以软链接的方式生成到druidHome/conf/druid/cluster/_common/
core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml

启动服务

Master server 启动coordinators和Overloads

nohup /opt/druid/bin/start-cluster-master-no-zk-server > /data/logs/druid/master.log 2>&1 &

Query server 启动Routers和Brokers

  nohup /opt/druid/bin/start-cluster-query-server > /data/logs/druid/query.log 2>&1 &

Data server 启动Middle Managers和Historicals

nohup /opt/druid/bin/start-cluster-data-server > /data/logs/druid/data.log 2>&1 &

注意点

common.runtime.properties注释配置druid.host=localhost，各个druid服务会使用进程中的函数InetAddress.getLocalHost().getCanonicalHostName()得到hostname，而无需针对每个服务器hostname做配置硬编码。有利于ansible分发部署软件的运行。

关闭服务

ps -ef|grep druid
kill 子进程... 父进程

默认资源占用

角色	默认资源占用
Coordinator	-Xms15g -Xmx15g
Overlord	Xms15g -Xmx15g
middleManager	Xms128m -Xmx128m
historical	-Xms8g -Xmx8g -XX:MaxDirectMemorySize=13g
broker	-Xms12g -Xmx12g -XX:MaxDirectMemorySize=6g
routers	-Xms1g -Xmx1g -XX:MaxDirectMemorySize=128m

配置文件参考

当前集群是验证druid的功能特性，参数没有调优。

关键点：

 1. common.runtime.properties注释掉配置druid.host=localhost

       #druid.host=localhost

 2. 配置Mysql作为metadata的存储；配置zookeeper(独立部署的zookeeper)；配置druid.extensions.loadList
 3. 配置hdfs目录作为storage、logs的存储

参考

MySQL Metadata Store

https://druid.apache.org/docs/latest/development/extensions-core/mysql.html

Clustered deployment

https://druid.apache.org/docs/latest/tutorials/cluster.html

具体配置文件

common.runtime.properties

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

# Extensions specified in the load list will be loaded by Druid
# We are using local fs for deep storage - not recommended for production - use S3, HDFS, or NFS instead
# We are using local derby for the metadata store - not recommended for production - use MySQL or Postgres instead

# If you specify `druid.extensions.loadList=[]`, Druid won't load any extension from file system.
# If you don't specify `druid.extensions.loadList`, Druid will load all the extensions under root extension directory.
# More info: https://druid.apache.org/docs/latest/operations/including-extensions.html
druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches","mysql-metadata-storage"]

# If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory
# and uncomment the line below to point to your directory.
#druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies


#
# Hostname
#
#druid.host=localhost

#
# Logging
#

# Log all runtime properties on startup. Disable to avoid logging properties on startup:
druid.startup.logging.logProperties=true

#
# Zookeeper
#

druid.zk.service.host=server-1:2181,server-2:2181,server-3:2181,server-4:2181,server-5:2181
druid.zk.paths.base=/druid

#
# Metadata storage
#

# For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over):
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527

# For MySQL (make sure to include the MySQL JDBC driver on the classpath):
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://server-9:3309/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=pass

# For PostgreSQL:
#druid.metadata.storage.type=postgresql
#druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
#druid.metadata.storage.connector.user=...
#druid.metadata.storage.connector.password=...

#
# Deep storage
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments

# For HDFS:
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments

# For S3:
#druid.storage.type=s3
#druid.storage.bucket=your-bucket
#druid.storage.baseKey=druid/segments
#druid.s3.accessKey=...
#druid.s3.secretKey=...

#
# Indexing service logs
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs

# For HDFS:
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs

# For S3:
#druid.indexer.logs.type=s3
#druid.indexer.logs.s3Bucket=your-bucket
#druid.indexer.logs.s3Prefix=druid/indexing-logs

#
# Service discovery
#

druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator

#
# Monitoring
#

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=noop
druid.emitter.logging.logLevel=info

# Storage type of double columns
# ommiting this will lead to index double as float at the storage layer

druid.indexing.doubleStorage=double

#
# Security
#
druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]


#
# SQL
#
druid.sql.enable=true

#
# Lookups
#
druid.lookup.enableLookupSyncOnStartup=false

router runtime.properties

druid.service=druid/router
druid.plaintextPort=8888

# HTTP proxy
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100

# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator

# Management proxy to coordinator / overlord: required for unified web console.
druid.router.managementProxy.enabled=true

broker runtime.properties

druid.service=druid/broker
druid.plaintextPort=8082

# HTTP server settings
druid.server.http.numThreads=60

# HTTP client settings
druid.broker.http.numConnections=50
druid.broker.http.maxQueuedBytes=10000000

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=6
druid.processing.numThreads=1
druid.processing.tmpDir=var/druid/processing

# Query cache disabled -- push down caching and merging instead
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false

master runtime.properties


druid.service=druid/coordinator
druid.plaintextPort=8081

druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S

# Run the overlord service in the coordinator process
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord

druid.indexer.queue.startDelay=PT5S

druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata

historical runtime.properties


druid.service=druid/historical
druid.plaintextPort=8083

# HTTP server threads
druid.server.http.numThreads=60

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=var/druid/processing

# Segment storage
druid.segmentCache.locations=[{"path":"/data/apps/druid/segment-cache","maxSize":300000000000}]
druid.server.maxSize=300000000000

# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256000000

middleManager runtime.properties

druid.service=druid/middleManager
druid.plaintextPort=8091


# Number of tasks per middleManager
druid.worker.capacity=4

# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task

# HTTP server threads
druid.server.http.numThreads=60

# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=1

# Hadoop indexing
#druid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing

仰望夜空一万次

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
Apache Druid集群部署

本文简单介绍Apache Druid的集群部署方式这个简单的集群的特征为：1台服务器部署Coordinator 和Overlord 进程，作为Master角色2台可扩展、容错的的服务器部署Historical和MiddleManager进程，作为Data角色1台服务器部署Broker和Router进程，作为Query角色在生产环境，推荐署多个Master和多个Query服务来满足...
复制链接

扫一扫