欢迎关注今日头条号、微信公众号、知乎号:仰望夜空一万次
随意聊聊并记录从小城市到上海工作生活的所思所想。
不去记录,有些事情都好像没有发生过。
本文简单介绍Apache Druid的集群部署方式
这个简单的集群的特征为:
- 1台服务器部署Coordinator 和Overlord 进程,作为Master角色
- 2台可扩展、容错的的服务器部署Historical和MiddleManager进程,作为Data角色
- 1台服务器部署Broker和Router进程,作为Query角色
在生产环境,推荐署多个Master和多个Query服务来满足容错。但现在可以快速的使用一个Master、一个Query服务器的方式先完成集群部署,后续添加Master、Query服务器。
机器角色划分
假设我们有四台机器,以官网推荐最简单的模式分配组件角色。后期可以对各个角色进行水平扩展。
服务器名称 | 部署角色类型 | 实际部署组件 |
---|---|---|
severMaster | Master | Coordinator、Overloads |
severQuery | Query | Router、Broker |
severData1 | Data | Middle Manager、Historical |
severData2 | Data | Middle Manager、Historical |
服务器角色示意图
依赖组件
-
apache-druid-0.16.0-incubating-bin.tar.gz
-
Java
版本要求:Java 8 (8u92+)
-
Mysql
用于存储MetaData信息
安装步骤
安装Mysql
使用Docker安装Mysql
#下载Mysql的Docker镜像
docker pull mysql:5.7.22
#启动,机器3309对docker内部映射3306
docker run --name mysql-docker -v /data/apps/mysqldata:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=xxx -p 3309:3306 -d mysql:5.7.22
将mysql驱动拷贝到druid扩展目录
cp mysql-connector-java-5.1.38.jar /opt/druid/extensions/mysql-metadata-storage
执行命令,初始化数据库
#进入docker mysql命令行
docker exec -it mysql-docker bash
#使用root登陆
mysql -uroot -p
#创建数据库,赋权
CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4;
CREATE USER 'druid'@'%' IDENTIFIED BY 'druid';
GRANT ALL PRIVILEGES ON druid.* TO 'druid'@'%';
初始化HDFS的druid目录
为了使用HDFS作为Deep Storage,需要用到HDFS
#Hadoop目录文件创建、hdfs用户
hadoop fs -mkdir -p /druid/segments
hadoop fs -mkdir -p /druid/indexing-logs
hadoop fs -mkdir -p /tmp/druid-indexing
hadoop fs -chmod -R 777 /druid
hadoop fs -chmod -R 777 /tmp/druid-indexing
创建软链接
并将hadoop的配置文件以软链接的方式生成到druidHome/conf/druid/cluster/_common/
core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml
启动服务
- Master server 启动coordinators和Overloads
nohup /opt/druid/bin/start-cluster-master-no-zk-server > /data/logs/druid/master.log 2>&1 &
- Query server 启动Routers和Brokers
nohup /opt/druid/bin/start-cluster-query-server > /data/logs/druid/query.log 2>&1 &
- Data server 启动Middle Managers和Historicals
nohup /opt/druid/bin/start-cluster-data-server > /data/logs/druid/data.log 2>&1 &
注意点
common.runtime.properties注释配置druid.host=localhost,各个druid服务会使用进程中的函数InetAddress.getLocalHost().getCanonicalHostName()得到hostname,而无需针对每个服务器hostname做配置硬编码。有利于ansible分发部署软件的运行。
关闭服务
ps -ef|grep druid
kill 子进程... 父进程
默认资源占用
角色 | 默认资源占用 |
---|---|
Coordinator | -Xms15g -Xmx15g |
Overlord | Xms15g -Xmx15g |
middleManager | Xms128m -Xmx128m |
historical | -Xms8g -Xmx8g -XX:MaxDirectMemorySize=13g |
broker | -Xms12g -Xmx12g -XX:MaxDirectMemorySize=6g |
routers | -Xms1g -Xmx1g -XX:MaxDirectMemorySize=128m |
配置文件参考
当前集群是验证druid的功能特性,参数没有调优。
关键点:
1. common.runtime.properties注释掉配置druid.host=localhost
#druid.host=localhost
2. 配置Mysql作为metadata的存储;配置zookeeper(独立部署的zookeeper);配置druid.extensions.loadList
3. 配置hdfs目录作为storage、logs的存储
参考
- MySQL Metadata Store
https://druid.apache.org/docs/latest/development/extensions-core/mysql.html
- Clustered deployment
https://druid.apache.org/docs/latest/tutorials/cluster.html
具体配置文件
common.runtime.properties
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Extensions specified in the load list will be loaded by Druid
# We are using local fs for deep storage - not recommended for production - use S3, HDFS, or NFS instead
# We are using local derby for the metadata store - not recommended for production - use MySQL or Postgres instead
# If you specify `druid.extensions.loadList=[]`, Druid won't load any extension from file system.
# If you don't specify `druid.extensions.loadList`, Druid will load all the extensions under root extension directory.
# More info: https://druid.apache.org/docs/latest/operations/including-extensions.html
druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches","mysql-metadata-storage"]
# If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory
# and uncomment the line below to point to your directory.
#druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies
#
# Hostname
#
#druid.host=localhost
#
# Logging
#
# Log all runtime properties on startup. Disable to avoid logging properties on startup:
druid.startup.logging.logProperties=true
#
# Zookeeper
#
druid.zk.service.host=server-1:2181,server-2:2181,server-3:2181,server-4:2181,server-5:2181
druid.zk.paths.base=/druid
#
# Metadata storage
#
# For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over):
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527
# For MySQL (make sure to include the MySQL JDBC driver on the classpath):
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://server-9:3309/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=pass
# For PostgreSQL:
#druid.metadata.storage.type=postgresql
#druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
#druid.metadata.storage.connector.user=...
#druid.metadata.storage.connector.password=...
#
# Deep storage
#
# For local disk (only viable in a cluster if this is a network mount):
#druid.storage.type=local
#druid.storage.storageDirectory=var/druid/segments
# For HDFS:
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
# For S3:
#druid.storage.type=s3
#druid.storage.bucket=your-bucket
#druid.storage.baseKey=druid/segments
#druid.s3.accessKey=...
#druid.s3.secretKey=...
#
# Indexing service logs
#
# For local disk (only viable in a cluster if this is a network mount):
#druid.indexer.logs.type=file
#druid.indexer.logs.directory=var/druid/indexing-logs
# For HDFS:
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
# For S3:
#druid.indexer.logs.type=s3
#druid.indexer.logs.s3Bucket=your-bucket
#druid.indexer.logs.s3Prefix=druid/indexing-logs
#
# Service discovery
#
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
#
# Monitoring
#
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=noop
druid.emitter.logging.logLevel=info
# Storage type of double columns
# ommiting this will lead to index double as float at the storage layer
druid.indexing.doubleStorage=double
#
# Security
#
druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]
#
# SQL
#
druid.sql.enable=true
#
# Lookups
#
druid.lookup.enableLookupSyncOnStartup=false
router runtime.properties
druid.service=druid/router
druid.plaintextPort=8888
# HTTP proxy
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100
# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
# Management proxy to coordinator / overlord: required for unified web console.
druid.router.managementProxy.enabled=true
broker runtime.properties
druid.service=druid/broker
druid.plaintextPort=8082
# HTTP server settings
druid.server.http.numThreads=60
# HTTP client settings
druid.broker.http.numConnections=50
druid.broker.http.maxQueuedBytes=10000000
# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=6
druid.processing.numThreads=1
druid.processing.tmpDir=var/druid/processing
# Query cache disabled -- push down caching and merging instead
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false
master runtime.properties
druid.service=druid/coordinator
druid.plaintextPort=8081
druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S
# Run the overlord service in the coordinator process
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord
druid.indexer.queue.startDelay=PT5S
druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata
historical runtime.properties
druid.service=druid/historical
druid.plaintextPort=8083
# HTTP server threads
druid.server.http.numThreads=60
# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15
druid.processing.tmpDir=var/druid/processing
# Segment storage
druid.segmentCache.locations=[{"path":"/data/apps/druid/segment-cache","maxSize":300000000000}]
druid.server.maxSize=300000000000
# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256000000
middleManager runtime.properties
druid.service=druid/middleManager
druid.plaintextPort=8091
# Number of tasks per middleManager
druid.worker.capacity=4
# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task
# HTTP server threads
druid.server.http.numThreads=60
# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=1
# Hadoop indexing
#druid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp
druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing