Apache Druid集群部署

欢迎关注今日头条号、微信公众号、知乎号:仰望夜空一万次

随意聊聊并记录从小城市到上海工作生活的所思所想。

不去记录,有些事情都好像没有发生过。

本文简单介绍Apache Druid的集群部署方式

这个简单的集群的特征为:

  • 1台服务器部署Coordinator 和Overlord 进程,作为Master角色
  • 2台可扩展、容错的的服务器部署Historical和MiddleManager进程,作为Data角色
  • 1台服务器部署Broker和Router进程,作为Query角色

在生产环境,推荐署多个Master和多个Query服务来满足容错。但现在可以快速的使用一个Master、一个Query服务器的方式先完成集群部署,后续添加Master、Query服务器。

机器角色划分

假设我们有四台机器,以官网推荐最简单的模式分配组件角色。后期可以对各个角色进行水平扩展。

服务器名称部署角色类型实际部署组件
severMasterMasterCoordinator、Overloads
severQueryQueryRouter、Broker
severData1DataMiddle Manager、Historical
severData2DataMiddle Manager、Historical

服务器角色示意图

图片

依赖组件

  • apache-druid-0.16.0-incubating-bin.tar.gz

  • Java

    版本要求:Java 8 (8u92+)
    
  • Mysql

    用于存储MetaData信息
    

安装步骤

安装Mysql

使用Docker安装Mysql

#下载Mysql的Docker镜像
docker pull mysql:5.7.22
#启动,机器3309对docker内部映射3306
docker run --name mysql-docker -v /data/apps/mysqldata:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=xxx -p 3309:3306 -d mysql:5.7.22
将mysql驱动拷贝到druid扩展目录
cp mysql-connector-java-5.1.38.jar /opt/druid/extensions/mysql-metadata-storage

执行命令,初始化数据库
#进入docker mysql命令行
docker exec -it mysql-docker bash
#使用root登陆
mysql -uroot -p
#创建数据库,赋权
CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4;
CREATE USER 'druid'@'%' IDENTIFIED BY 'druid';
GRANT ALL PRIVILEGES ON druid.* TO 'druid'@'%';

初始化HDFS的druid目录

为了使用HDFS作为Deep Storage,需要用到HDFS

#Hadoop目录文件创建、hdfs用户
hadoop fs -mkdir -p /druid/segments
hadoop fs  -mkdir -p /druid/indexing-logs
hadoop fs  -mkdir -p /tmp/druid-indexing
hadoop fs -chmod -R 777 /druid
hadoop fs -chmod -R 777 /tmp/druid-indexing

创建软链接

并将hadoop的配置文件以软链接的方式生成到druidHome/conf/druid/cluster/_common/
core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml

启动服务

  • Master server 启动coordinators和Overloads
nohup /opt/druid/bin/start-cluster-master-no-zk-server > /data/logs/druid/master.log 2>&1 &
  • Query server 启动Routers和Brokers
  nohup /opt/druid/bin/start-cluster-query-server > /data/logs/druid/query.log 2>&1 &
  • Data server 启动Middle Managers和Historicals
nohup /opt/druid/bin/start-cluster-data-server > /data/logs/druid/data.log 2>&1 &

注意点

common.runtime.properties注释配置druid.host=localhost,各个druid服务会使用进程中的函数InetAddress.getLocalHost().getCanonicalHostName()得到hostname,而无需针对每个服务器hostname做配置硬编码。有利于ansible分发部署软件的运行。

关闭服务

ps -ef|grep druid
kill 子进程... 父进程

默认资源占用

角色默认资源占用
Coordinator-Xms15g -Xmx15g
OverlordXms15g -Xmx15g
middleManagerXms128m -Xmx128m
historical-Xms8g -Xmx8g -XX:MaxDirectMemorySize=13g
broker-Xms12g -Xmx12g -XX:MaxDirectMemorySize=6g
routers-Xms1g -Xmx1g -XX:MaxDirectMemorySize=128m

配置文件参考

当前集群是验证druid的功能特性,参数没有调优。

关键点:

 1. common.runtime.properties注释掉配置druid.host=localhost

       #druid.host=localhost

 2. 配置Mysql作为metadata的存储;配置zookeeper(独立部署的zookeeper);配置druid.extensions.loadList
 3. 配置hdfs目录作为storage、logs的存储

参考

  • MySQL Metadata Store

https://druid.apache.org/docs/latest/development/extensions-core/mysql.html

  • Clustered deployment

https://druid.apache.org/docs/latest/tutorials/cluster.html

具体配置文件

common.runtime.properties

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

# Extensions specified in the load list will be loaded by Druid
# We are using local fs for deep storage - not recommended for production - use S3, HDFS, or NFS instead
# We are using local derby for the metadata store - not recommended for production - use MySQL or Postgres instead

# If you specify `druid.extensions.loadList=[]`, Druid won't load any extension from file system.
# If you don't specify `druid.extensions.loadList`, Druid will load all the extensions under root extension directory.
# More info: https://druid.apache.org/docs/latest/operations/including-extensions.html
druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches","mysql-metadata-storage"]

# If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory
# and uncomment the line below to point to your directory.
#druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies


#
# Hostname
#
#druid.host=localhost

#
# Logging
#

# Log all runtime properties on startup. Disable to avoid logging properties on startup:
druid.startup.logging.logProperties=true

#
# Zookeeper
#

druid.zk.service.host=server-1:2181,server-2:2181,server-3:2181,server-4:2181,server-5:2181
druid.zk.paths.base=/druid

#
# Metadata storage
#

# For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over):
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527

# For MySQL (make sure to include the MySQL JDBC driver on the classpath):
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://server-9:3309/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=pass

# For PostgreSQL:
#druid.metadata.storage.type=postgresql
#druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
#druid.metadata.storage.connector.user=...
#druid.metadata.storage.connector.password=...

#
# Deep storage
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments

# For HDFS:
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments

# For S3:
#druid.storage.type=s3
#druid.storage.bucket=your-bucket
#druid.storage.baseKey=druid/segments
#druid.s3.accessKey=...
#druid.s3.secretKey=...

#
# Indexing service logs
#

# For local disk (only viable in a cluster if this is a network mount):
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs

# For HDFS:
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs

# For S3:
#druid.indexer.logs.type=s3
#druid.indexer.logs.s3Bucket=your-bucket
#druid.indexer.logs.s3Prefix=druid/indexing-logs

#
# Service discovery
#

druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator

#
# Monitoring
#

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=noop
druid.emitter.logging.logLevel=info

# Storage type of double columns
# ommiting this will lead to index double as float at the storage layer

druid.indexing.doubleStorage=double

#
# Security
#
druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]


#
# SQL
#
druid.sql.enable=true

#
# Lookups
#
druid.lookup.enableLookupSyncOnStartup=false

router runtime.properties

druid.service=druid/router
druid.plaintextPort=8888

# HTTP proxy
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100

# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator

# Management proxy to coordinator / overlord: required for unified web console.
druid.router.managementProxy.enabled=true

broker runtime.properties

druid.service=druid/broker
druid.plaintextPort=8082

# HTTP server settings
druid.server.http.numThreads=60

# HTTP client settings
druid.broker.http.numConnections=50
druid.broker.http.maxQueuedBytes=10000000

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=6
druid.processing.numThreads=1
druid.processing.tmpDir=var/druid/processing

# Query cache disabled -- push down caching and merging instead
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false

master runtime.properties


druid.service=druid/coordinator
druid.plaintextPort=8081

druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S

# Run the overlord service in the coordinator process
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord

druid.indexer.queue.startDelay=PT5S

druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata

historical runtime.properties


druid.service=druid/historical
druid.plaintextPort=8083

# HTTP server threads
druid.server.http.numThreads=60

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=var/druid/processing

# Segment storage
druid.segmentCache.locations=[{"path":"/data/apps/druid/segment-cache","maxSize":300000000000}]
druid.server.maxSize=300000000000

# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256000000

middleManager runtime.properties

druid.service=druid/middleManager
druid.plaintextPort=8091


# Number of tasks per middleManager
druid.worker.capacity=4

# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task

# HTTP server threads
druid.server.http.numThreads=60

# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=1

# Hadoop indexing
#druid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
Apache Druid提供了多种备份方式,包括本地备份和远程备份。 1. 本地备份 本地备份是将Druid集群数据备份到本地磁盘上。可以通过以下步骤进行本地备份: - 在Druid集群的coordinator节点上运行以下命令,创建本地备份: ``` bin/druid.sh coordinator -conf conf/coordinator/local_backup.json ``` - 创建local_backup.json文件,并指定备份目录和时间范围: ``` { "type" : "local", "backupDirectory" : "/path/to/backup/directory", "period" : "PT1H", "windowPeriod" : "PT10M" } ``` 其中,backupDirectory指定备份目录,period指定备份周期,windowPeriod指定备份时间窗口。 2. 远程备份 远程备份是将Druid集群数据备份到远程存储设备上,如AWS S3或HDFS。可以通过以下步骤进行远程备份: - 在Druid集群的coordinator节点上安装S3插件或HDFS插件。 - 在Druid集群的coordinator节点上创建一个JSON文件,指定远程备份配置: ``` { "type" : "s3", // or "hdfs" "bucket" : "my-bucket", "basePath" : "druid/backups", "accessKey" : "my-access-key", "secretKey" : "my-secret-key", "endpoint" : "s3.amazonaws.com", // or your HDFS endpoint "period" : "PT1H", "windowPeriod" : "PT10M" } ``` 其中,type指定备份类型,bucket和basePath指定备份存储路径,accessKey和secretKey指定访问凭证,endpoint指定远程存储设备的地址,period和windowPeriod同上。 - 在Druid集群的coordinator节点上运行以下命令,启动备份任务: ``` bin/druid.sh coordinator -conf conf/coordinator/remote_backup.json ``` 以上是Druid数据备份的简单介绍,更多详细内容可参考Druid官方文档。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值