Linux 7环境,假设已安装GI,已基于ASM建立数据库。我们来看一下数据库是如何自动启动的。
Linux 7使用systemd管理服务,systemd 是系统启动后的第一个进程:
$ ps --pid 1
PID TTY TIME CMD
1 ? 00:00:01 systemd
Oracle Restart的概念是操作系统包含OHAS,OHAS包含其他资源,如ASM,数据库等。
在服务中我们找到了ohas(–type表示只列出服务,–all表示不论状态):
$ systemctl list-units --type service --all|grep ohas
UNIT LOAD ACTIVE SUB DESCRIPTION
ohasd.service loaded active exited LSB: Start and Stop Oracle High Availability Service
oracle-ohasd.service loaded active running Oracle High Availability Services
启动脚本中链接到了ohas脚本:
$ ll /etc/rc3.d/S96ohasd
lrwxrwxrwx. 1 root root 17 Sep 9 01:37 /etc/rc3.d/S96ohasd -> /etc/init.d/ohasd
这两个服务状态都是active的,但子状态有一个是exited:
$ systemctl is-active oracle-ohasd.service
active
$ systemctl is-active ohasd.service
active
看一下此两服务的状态:
$ systemctl status ohasd.service
● ohasd.service - LSB: Start and Stop Oracle High Availability Service
Loaded: loaded (/etc/rc.d/init.d/ohasd; bad; vendor preset: disabled)
Active: active (exited) since Tue 2019-09-10 00:41:44 UTC; 58min ago
Docs: man:systemd-sysv-generator(8)
Process: 1265 ExecStart=/etc/rc.d/init.d/ohasd start (code=exited, status=0/SUCCESS)
$ systemctl status oracle-ohasd.service
● oracle-ohasd.service - Oracle High Availability Services
Loaded: loaded (/etc/systemd/system/oracle-ohasd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2019-09-10 00:41:43 UTC; 58min ago
Main PID: 1262 (init.ohasd)
CGroup: /system.slice/oracle-ohasd.service
├─1262 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
├─1334 /u01/app/19.0.0/grid/bin/ohasd.bin reboot
├─2540 /u01/app/19.0.0/grid/bin/oraagent.bin
├─2646 /u01/app/19.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
├─2666 /u01/app/19.0.0/grid/bin/evmd.bin
├─3022 /u01/app/19.0.0/grid/bin/evmlogger.bin -o /u01/app/19.0.0/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/19.0.0/grid/log/[HOSTNAME]/...
├─3083 /u01/app/19.0.0/grid/bin/cssdagent
├─3199 /u01/app/19.0.0/grid/bin/ocssd.bin
├─4599 asm_pmon_+ASM
├─4601 asm_clmn_+ASM
├─4603 asm_psp0_+ASM
├─4606 asm_vktm_+ASM
├─4610 asm_gen0_+ASM
├─4612 asm_mman_+ASM
├─4616 asm_gen1_+ASM
├─4619 asm_diag_+ASM
├─4621 asm_pman_+ASM
├─4623 asm_dia0_+ASM
├─4625 asm_dbw0_+ASM
├─4627 asm_lgwr_+ASM
├─4629 asm_ckpt_+ASM
├─4631 asm_smon_+ASM
├─4633 asm_lreg_+ASM
├─4635 asm_pxmn_+ASM
├─4637 asm_rbal_+ASM
├─4639 asm_gmon_+ASM
├─4641 asm_mmon_+ASM
├─4643 asm_mmnl_+ASM
├─4705 oracle+ASM (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
└─4706 oracle+ASM (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
ohasd.service负责启动ohas服务,已经退出。
oracle-ohasd.service正在运行,是实际的服务守护进程。
看一下这两个服务执行的命令:
$ systemctl show oracle-ohasd.service|grep Exec
ExecMainStartTimestamp=Tue 2019-09-10 00:41:43 UTC
ExecMainStartTimestampMonotonic=11457980
ExecMainExitTimestampMonotonic=0
ExecMainPID=1262
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/etc/init.d/init.ohasd ; argv[]=/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null ; ignore_errors=no ; start_time=[Tue 2019-09-10 00:41:43 UTC] ; stop_time=[n/a] ; pid=1262 ; code=(null) ; status=0/0 }
ExecStop={ path=/etc/init.d/init.ohasd ; argv[]=/etc/init.d/init.ohasd stop >/dev/null 2>&1 </dev/null ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
$ systemctl show ohasd.service|grep Exec
ExecMainStartTimestampMonotonic=0
ExecMainExitTimestampMonotonic=0
ExecMainPID=0
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/etc/rc.d/init.d/ohasd ; argv[]=/etc/rc.d/init.d/ohasd start ; ignore_errors=no ; start_time=[Tue 2019-09-10 00:41:43 UTC] ; stop_time=[Tue 2019-09-10 00:41:44 UTC] ; pid=1265 ; code=exited ; status=0 }
ExecStop={ path=/etc/rc.d/init.d/ohasd ; argv[]=/etc/rc.d/init.d/ohasd stop ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
查看一下上面输出中的ExecMainPID:
$ ps p 1262
PID TTY STAT TIME COMMAND
1262 ? Ss 0:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
我们注意到,有两个关键的脚本,ohasd和init.ohasd,在/etc/init.d和/etc/rc.d/init.d下都有,互为hard link,因为inode相同:
$ ls -l /etc/init.d/*ohas*
-rwxr-x---. 1 root root 15119 Sep 9 01:37 /etc/init.d/init.ohasd
-rwxr-x---. 1 root root 7791 Sep 9 01:37 /etc/init.d/ohasd
$ ls -l /etc/rc.d/init.d/*ohas*
-rwxr-x---. 1 root root 15119 Sep 9 01:37 /etc/rc.d/init.d/init.ohasd
-rwxr-x---. 1 root root 7791 Sep 9 01:37 /etc/rc.d/init.d/ohasd
$ ls -i /etc/init.d/ohasd /etc/rc.d/init.d/ohasd
83447865 /etc/init.d/ohasd 83447865 /etc/rc.d/init.d/ohasd
ohasd虽然有300多行,但核心就一句话,启动ohas服务:
$ grep "crsctl start" ohasd
log_console "Fix the problem and issue command 'crsctl start has' as $HAS_USER user to start Oracle Grid Infrastructure."
my_crsctl start has -nowait
再来看iint.ohasd,有500多行。他的一个作用是监控ohas的状态,并在错误时重启。
看一下关于has的进程,init.ohasd是root运行的,而ohasd.bin是GI用户运行的,可能是实际的has服务进程:
$ ps -ef|grep ohas
root 1262 1 0 00:41 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
grid 1334 1 0 00:41 ? 00:00:40 /u01/app/19.0.0/grid/bin/ohasd.bin reboot
grid 10586 4880 0 02:25 pts/0 00:00:00 grep --color=auto ohas
最后再来说一下dbstart和dbshut,可用来启停数据库:
$ which dbstart dbshut
/u01/app/19.0.0/grid/bin/dbstart
/u01/app/19.0.0/grid/bin/dbshut
其它
参考
https://oracle-base.com/articles/linux/automating-database-startup-and-shutdown-on-linux
https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units
https://unix.stackexchange.com/questions/396630/the-proper-way-to-test-if-a-service-is-running-in-a-script
https://orainternals.wordpress.com/2013/06/05/clusterware-startup/
https://zhuanlan.zhihu.com/p/54221584