达梦数据库 | Helm(k8s)部署
文章目录
🗯️ 上文回顾:上文讲述了达梦数据库简介及 Docker 部署。
👉 本文目标:在 Docker 部署达梦数据库的基础上,将达梦部署在 K8s 中。
Docker 部署回顾
之前的 Docker 部署命令如下:
docker run -d -p 30236:5236 \
--restart=always --name=dm8_test --privileged=true \
-e LD_LIBRARY_PATH=/opt/dmdbms/bin \
-e PAGE_SIZE=16 \
-e EXTENT_SIZE=32 \
-e LOG_SIZE=1024 \
-e UNICODE_FLAG=1 \
-e LENGTH_IN_CHAR=1 \
-e INSTANCE_NAME=dm8_test \
-v /opt/data:/opt/dmdbms/data \
dm8:dm8_20240613_rev229704_x86_rh6_64
关键信息如下:
- 端口映射:使用 Service 实现
- privileged:使用 securityContext.privileged 实现
- 指定若干环境变量:k8s 也可以方便地指定环境变量
- 挂载存储:k8s 也支持持久化存储
Helm 部署
下面我们通过 Helm Chart 将达梦数据库部署到 K8s 中:custom-component-0.0.4.zip
准备 values.yaml 配置文件
values.yaml 配置如下:
1、指定内部端口为 5236:port: 5236
2、将服务暴露到集群外:accessFromOutsideCluster: true
3、指定环境变量:env: xxx
4、卷挂载:volume->/opt/dmdbms/data
5、使用特权模式:securityContext->privileged
image:
repository: harbor.xxx.space/cicd/dm8_single
pullPolicy: IfNotPresent
tag: "dm8_20230808_rev197096_x86_rh6_64"
port: 5236
#是否需要在集群外访问,若为true,将随机指定端口用于集群外访问,port参数指定的端口仅用于集群内服务访问
accessFromOutsideCluster: true
#非必填 环境变量
env:
- name: PAGE_SIZE
value: "8"
- name: LD_LIBRARY_PATH
value: "/opt/dmdbms/bin"
- name: EXTENT_SIZE
value: "16"
- name: BLANK_PAD_MODE
value: "1"
- name: LOG_SIZE
value: "256"
- name: UNICODE_FLAG
value: "1"
- name: LENGTH_IN_CHAR
value: "1"
- name: INSTANCE_NAME
value: "dm8"
- name: CASE_SENSITIVE
value: "0"
replicaCount: 1
volume:
enabled: true
size: 8Gi
mountPath: /opt/dmdbms/data
storageClassName: local-path
securityContext:
privileged: true
resources:
# limits 表示服务申请资源的上限
limits:
#1=1000m,表示1核
cpu: 2
memory: 8Gi
# requests 表示服务运行所需的最小资源需求
requests:
cpu: 1500m
memory: 3Gi
Helm 部署遇到的问题
DMServer 无法 ready
使用上述配置以后,启动后一直没法成功。😖
启动日志如下:
Script start.
file dm.key not found, use default license!
License will expire in 25 day(s) on 2024-07-26
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
log file path: /opt/dmdbms/data/DAMENG/DAMENG01.log
log file path: /opt/dmdbms/data/DAMENG/DAMENG02.log
write to dir [/opt/dmdbms/data/DAMENG].
create dm database success. 2024-07-01 20:13:41
initdb V8
db version: 0x7000c
Init DM success!
Start DmAPService...
Starting DmAPService: [60G[ [1;32mOK[0;39m ]
/opt/dmdbms/conf/dm.ini does not exist, use default dm.ini
Start DMSERVER success!
Dmserver is running.
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
DM Database is not OK, please wait...
* Starting periodic command scheduler cron
...done.
2023-07-27 19:57:51.680 [INFO] database P0000023431 T0000000000000023435 rfil_close_low set main rfil[../datatest/DAMENG/DAMENG01.log]'s sta to inactive, l_next_seq = 4788, g_next_seq = 4788, clsn = 38842, handle = 7, free=8720896, len=268435456
2023-07-27 19:57:51.681 [INFO] database P0000023431 T0000000000000023435 os_sema2_free, sema_id:3309591, sema_value:1!
2023-07-27 19:57:51.691 [INFO] database P0000023431 T0000000000000023435 shutdown MAL subsystem...
2023-07-27 19:57:51.797 [INFO] database P0000023431 T0000000000000023435 global inject hint deinit
2023-07-27 19:57:51.797 [INFO] database P0000023431 T0000000000000023435 global stat cache deinit
2023-07-27 19:57:51.803 [INFO] database P0000023431 T0000000000000023435 fil_sys_destroy
2023-07-27 19:57:52.812 [INFO] database P0000023431 T0000000000000023435 close lsnr socket
2023-07-27 19:57:52.815 [INFO] database P0000023431 T0000000000000023435 [for dem]SYSTEM SHUTDOWN SUCCESS.
2023-07-27 19:57:52.815 [INFO] database P0000023431 T0000000000000023435 DM Database Server shutdown successfully.
2023-07-27 19:57:52.815 [INFO] database P0000023431 T0000000000000023435 nsvr_notify_exit wakeup main thread to exit
‼️ 很尴尬,配置上没有啥不对的的地方啊?
‼️ 很奇怪,现在明明是 2024 年,怎么打印 2023 年 7 月 27 号的日志啊?
🚀 让我们仔细分析下启动脚本,一步一步,找出真相。
分析启动脚本
在容器镜像中,可以看到 Entrypoint 和 Cmd。【很奇怪,为啥 Cmd 里还有 ENTRYPOINT?#(nop)是啥?】
- “Entrypoint”: [“/opt/startup.sh”],
- “Cmd”: [“/bin/sh”,“-c”,“#(nop) “,“ENTRYPOINT [”/opt/startup.sh”]”]
⁉️很奇怪哈,这个启动命令(Entrypoint)已经是
/opt/startup.sh
了,但是启动参数(Cmd)里面也执行了一遍/opt/startup.sh
。
进入容器内容,查看下脚本目录:
启动脚本虽然是 singlestartup.sh
,但其内部调用本质上是 /opt/singlestartup.sh
。
root@dm8-0:/opt# cat singlestartup.sh
#!/bin/bash
PAGE_SIZE=${PAGE_SIZE}
CASE_SENSITIVE=${CASE_SENSITIVE}
UNICODE_FLAG=${UNICODE_FLAG}
LENGTH_IN_CHAR=${LENGTH_IN_CHAR}
INSTANCE_BUFFER=${BUFFER}
ADMIN_PWD=${SYSDBA_PWD}
CONN_PWD=\"${ADMIN_PWD}\"
EXTENT_SIZE=${EXTENT_SIZE}
BLANK_PAD_MODE=${BLANK_PAD_MODE}
LOG_SIZE=${LOG_SIZE}
export LANG=en_US.UTF-8
function wait_dm_running() {
for i in `seq 1 10`
do
pid=`ps -eo pid,args | grep -F "./dmserver /opt/dmdbms/conf/dm.ini" | grep -v "grep" | tail -1 | awk '{print $1}'`
if [ ! -f "/opt/dmdbms/conf/dm.ini" ]; then
pid=`ps -eo pid,args | grep -F "./dmserver /opt/dmdbms/data/DAMENG/dm.ini" | grep -v "grep" | tail -1 | awk '{print $1}'`
fi
if [ "$pid" != "" ]; then
echo "Dmserver is running."
break
else
echo "Dmserver is not running yet..."
sleep 10
fi
done
}
function wait_dm_ready() {
for i in `seq 1 10`
do
echo `./disql /nolog <<EOF
CONN SYSDBA/${CONN_PWD}@localhost
exit
EOF` | grep "connection failure" > /dev/null 2>&1
if [ $? -eq 0 ]
then
echo "DM Database is not OK, please wait..."
sleep 10
else
echo "DM Database is OK"
break
fi
done
}
if [ ! -d "/opt/dmdbms/data/DAMENG" ]; then
cd /opt/dmdbms/bin
./dminit PATH=/opt/dmdbms/data PAGE_SIZE=${PAGE_SIZE} CASE_SENSITIVE=${CASE_SENSITIVE} UNICODE_FLAG=${UNICODE_FLAG} LENGTH_IN_CHAR=${LENGTH_IN_CHAR} SYSDBA_PWD=${ADMIN_PWD} EXTENT_SIZE=${EXTENT_SIZE} BLANK_PAD_MODE=${BLANK_PAD_MODE} LOG_SIZE=${LOG_SIZE} BUFFER=${INSTANCE_BUFFER}
echo "Init DM success!"
fi
cd /opt/dmdbms/bin
echo "Start DmAPService..."
./DmAPService start
if [ ! -f "/opt/dmdbms/conf/dm.ini" ]; then
echo "/opt/dmdbms/conf/dm.ini does not exist, use default dm.ini"
./dmserver /opt/dmdbms/data/DAMENG/dm.ini -noconsole > /opt/dmdbms/log/DmServiceDMSERVER.log 2>&1 &
else
./dmserver /opt/dmdbms/conf/dm.ini -noconsole > /opt/dmdbms/log/DmServiceDMSERVER.log 2>&1 &
fi
echo "Start DMSERVER success!"
wait_dm_running
wait_dm_ready
if [ ! -f "/opt/dmdbms/log/dm_DMSERVER.log" ]; then
current_year_month=`date +%Y%m`
DM_LOG=dm_DMSERVER_${current_year_month}.log
ln -s /opt/dmdbms/log/${DM_LOG} /opt/dmdbms/log/dm_DMSERVER.log
echo "Finished soft link DM current ${DM_LOG} to dm_DMSERVER.log"
fi
echo "5 0 1 * * root /opt/switchDmLog.sh" >> /etc/crontab
#systemctl restart crond.service
/etc/init.d/cron start
#echo "Start Cron Service"
tail -F /opt/dmdbms/log/dm_DMSERVER.log
tail -f /dev/null
上述启动脚本的逻辑如下:
- 步骤 1:配置环境变量
- 步骤 2:判断是否存在 /opt/dmdbms/data/DAMENG 目录,没有则初始化数据库
- 步骤 3:启动 DmAPService
- 步骤 4:判断是否存在 /opt/dmdbms/conf/dm.ini 文件,没有则使用默认配置启动 DMSERVER
- 步骤 5:判断 DMServer 是否运行,根据 ps 命令查找 dmserver 进程。每 10 秒查看一次,查看 10 次。
- 步骤 6:判断 DMServer 是否 Ready,使用 disql 连接。每 10 秒查看一次,查看 10 次。
- 步骤 7:判断 /opt/dmdbms/log/dm_DMSERVER.log 不存在,则创建软链接
- DM_LOG=dm_DMSERVER_${current_year_month}.log
- ln -s /opt/dmdbms/log/${DM_LOG} /opt/dmdbms/log/dm_DMSERVER.log
- 步骤 8:使用 crontab 定时切换日志,即上面步骤 7 的逻辑一直判断
- 步骤 9:tail 查看 /opt/dmdbms/log/dm_DMSERVER.log 日志
🆒 通过上述的步骤 7,我们可以知道,真正的日志文件是 **/opt/dmdbms/log/dm_DMSERVER_${current_year_month}.log**
,有日志文件就好办了,
分析启动日志
日志文件:tail -200f /opt/dmdbms/log/dm_DMSERVER_202407.log
成功的启动日志:有 SYSTEM IS READY 字样
失败的启动日志:【这个是第一次启动失败后,重启后的效果】
成功的一次尝试:进入容器内,先删除挂载目录(/opt/dmdbms/data/DAMENG),再手动调用 /opt/singlestartup.sh ,发现可以启动成功。但是启动可能 10*10 秒无法 Ready,需要更长的超时时间。
👉 小目标:修改 Ready 判断次数为 100,发现可以正常启动。
覆盖启动脚本
思路:使用 k8s 的 command 和 args 覆盖默认的 ENTRYPOINT 和 CMD。
- “Entrypoint”: [“/opt/startup.sh”],
- “Cmd”: [“/bin/sh”,“-c”,“#(nop) “,“ENTRYPOINT [”/opt/startup.sh”]”],
⁉️再次疑问:镜像中的 ENTRYPOINT 和 CMD 的写法,难道是执行两遍?
第 1 次尝试:只使用 command =》启动报错,启动不了容器
command: [ "/bin/sh", "-c", "#(nop) ", "/opt/startup.sh" ]
第 2 次尝试:去掉 "#(nop) " =》 可以启动成功
command: [ "/bin/sh", "-c", "/opt/startup.sh" ]
‼️第 3 次修改:发现 dm8_20230808_rev197096_x86_rh6_64 镜像里已经包含了 /opt/dmdbms/log/dm_DMSERVER.log
文件,所以导致启动脚本无法创建软链接,从而无法显示实际的日志文件。
‼️第 4 次修改:调大 running、ready 的等待次数,原值为 10 次,每次 10 秒,所以最大 100 秒,现在改为 100 次。
✅ 最终容器启动命令:
- 修改点 1:调大 running、ready 的等待次数
- 修改点 2:启动前删除不应该有的
/opt/dmdbms/log/dm_DMSERVER.log
文件
command: [ "/bin/sh", "-c", "sed -i 's%seq 1 10%seq 1 100%g' /opt/singlestartup.sh; rm /opt/dmdbms/log/dm_DMSERVER.log; /opt/startup.sh" ]
最终 values.yaml 配置文件
image:
repository: harbor.xxx.space/ht-registry/dm8
pullPolicy: IfNotPresent
# !!!开发版,证书有效期1年!!!
tag: "dm8_20240613_rev229704_x86_rh6_64"
port: 5236
# 是否需要在集群外访问,若为true,将随机指定端口用于集群外访问,port参数指定的端口仅用于集群内服务访问
accessFromOutsideCluster: true
# 非必填,环境变量,可参考官方文档配置(部分配置项,配置后不可修改)
env:
- name: PAGE_SIZE
value: "8"
- name: LD_LIBRARY_PATH
value: "/opt/dmdbms/bin"
- name: EXTENT_SIZE
value: "16"
- name: BLANK_PAD_MODE
value: "1"
- name: LOG_SIZE
value: "256"
- name: UNICODE_FLAG
value: "1"
- name: LENGTH_IN_CHAR
value: "1"
- name: INSTANCE_NAME
value: "dm8"
- name: CASE_SENSITIVE
value: "0"
replicaCount: 1
# 持久化存储,指定挂载路径,并使用storageClass创建PV
volume:
enabled: true
size: 8Gi
mountPath: /opt/dmdbms/data
storageClassName: local-path
# 启动命令
## 调大 running、ready 判断次数
## 删除老的日志文件,开发版中可能存在老的日志文件,导致启动脚本中无法显示实际的运行日志
## 调用实际的启动脚本
command: [ "/bin/sh", "-c", "sed -i 's%seq 1 10%seq 1 100%g' /opt/singlestartup.sh; rm /opt/dmdbms/log/dm_DMSERVER.log; /opt/startup.sh" ]
# 保持和Docker一样的配置
securityContext:
privileged: true
# 资源限制,至少2G内存
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 2000m
memory: 4Gi
启动成功日志:
Script start.
file dm.key not found, use default license!
License will expire in 24 day(s) on 2024-07-26
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
log file path: /opt/dmdbms/data/DAMENG/DAMENG01.log
log file path: /opt/dmdbms/data/DAMENG/DAMENG02.log
write to dir [/opt/dmdbms/data/DAMENG].
create dm database success. 2024-07-02 12:18:43
initdb V8
db version: 0x7000c
Init DM success!
Start DmAPService...
Starting DmAPService: [60G[ [1;32mOK[0;39m ]
/opt/dmdbms/conf/dm.ini does not exist, use default dm.ini
Start DMSERVER success!
Dmserver is running.
DM Database is not OK, please wait...
... 中间还有很多条 ...
DM Database is not OK, please wait...
DM Database is OK
Finished soft link DM current dm_DMSERVER_202407.log to dm_DMSERVER.log
* Starting periodic command scheduler cron
...done.
2024-07-02 12:33:45.970 [INFO] database P0000000053 T0000000000000000053 SYSTEM IS READY.
2024-07-02 12:33:45.970 [INFO] database P0000000053 T0000000000000000053 [for dem]SYSTEM IS READY.
2024-07-02 12:33:45.970 [INFO] database P0000000053 T0000000000000000053 set g_dw_stat from UNDEFINED to NONE success, g_dw_recover_stop is 0
2024-07-02 12:33:46.398 [INFO] database P0000000053 T0000000000000000688 checkpoint requested by CKPT_INTERVAL, rlog free space[521363456], used space[15499264]
2024-07-02 12:33:46.398 [INFO] database P0000000053 T0000000000000000688 checkpoint generate by ckpt_interval
2024-07-02 12:33:46.399 [INFO] database P0000000053 T0000000000000000099 checkpoint begin, used_space[15507456], free_space[521355264]...
2024-07-02 12:33:46.399 [INFO] database P0000000053 T0000000000000000109 trx4_min_tid_collect set min_active_id_opt, min_active_id: 6681, first_tid: 6006
2024-07-02 12:33:46.815 [INFO] database P0000000053 T0000000000000000099 ckpt2_log_adjust: full_status: 160, ptx_reserved: 0
2024-07-02 12:33:46.815 [INFO] database P0000000053 T0000000000000000099 ckpt2_log_adjust: ckpt_lsn(40026), ckpt_fil(0), ckpt_off(15503360), cur_lsn(40121), l_next_seq(4770), g_next_seq(4770), cur_free(15515648), total_space(536862720), used_space(12288), free_space(536850432), n_ep(1)
2024-07-02 12:33:46.816 [INFO] database P0000000053 T0000000000000000099 checkpoint end, 0 pages flushed, used_space[12288], free_space[536850432].
机器磁盘性能略差,大约 14min 可以启动成功:
至此,达梦数据库在 K8s 中的部署完成!!!🚀🚀🚀