一、安装supervisor
参考连接:https://cloud.tencent.com/developer/article/2055326
doris 节点都安装
yum install -y epel-release
yum install -y supervisor
systemctl enable supervisord # 开机自启动
systemctl start supervisord # 启动supervisord服务
systemctl restart supervisord # 重启supervisord服务
systemctl status supervisord # 查看supervisord服务状态
ps -ef|grep supervisord # 查看是否存在supervisord进程
二、配置supervisor 子进程
- FE
/etc/supervisord.d/doris-fe.ini
写入以下,command 和directory根据自己doris路径修改
[program:doris_fe_alive_checker]
command=sh /opt/doris/fe/bin/start_fe.sh
process_name=%(program_name)s
numprocs=1
directory=/opt/doris/fe/bin/
autostart=true
startsecs=2
startretries=5
autorestart=true
stopwaitsecs=10
stopasgroup=true
exitcodes=0,1
killasgroup=true
user=root
redirect_stderr=true
stdout_logfile=/var/log/supervisor/doris_fe_alive_checker.log
stdout_logfile_maxbytes=20MB
stdout_logfile_backups=10
- BE
/etc/supervisord.d/doris-be.ini
[program:doris_be_alive_checker]
command=sh /opt/doris/be/bin/start_be.sh
process_name=%(program_name)s
numprocs=1
directory=/opt/doris/be/bin/
autostart=true
startsecs=2
startretries=5
autorestart=true
stopwaitsecs=10
stopasgroup=true
exitcodes=0,1
killasgroup=true
user=root
redirect_stderr=true
stdout_logfile=/var/log/supervisor/doris_be_alive_checker.log
stdout_logfile_maxbytes=20MB
stdout_logfile_backups=10
三、启动监控子进程
supervisorctl reload
supervisorctl update
supervisorctl start all
四、错误排查
错误 doris_be_alive_checker: ERROR (spawn error)
查看对应日志
tail -200f /var/log/supervisor/doris_be_alive_checker.log
tail -200f /opt/doris/be/log/be.WARNING
错误截图
由此可见错误是由于open flie数太小导致的
/etc/security/limits.conf
* soft fsize unlimited
* hard fsize unlimited
* soft cpu unlimited
* hard cpu unlimited
* soft as unlimited
* hard as unlimited
* soft nofile 1048576
* hard nofile 1048576
* soft nproc unlimited
* hard nproc unlimited
/etc/profile 最后一行加入 ulimit -n 1048576
ulimit -n 1048576
/etc/security/limits.conf 修改后重启生效
重启后启动进程: supervisorctl start all
此时错误依然存在,修改be start 脚本 be_start.sh
vi be_start.sh ,将 ulimit -n 1008611 加在前面
如:
.............
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
ulimit -n 1008611 #一定加在前面
curdir=$(dirname "$0")
curdir=$(
cd "$curdir"
pwd
)
OPTS=$(getopt \
-n $0 \
-o '' \
-l 'daemon' \
-- "$@")
eval set -- "$OPTS"
...
重新启动 supervisorctl start all
注 此时be未启动,否则启动报错,日志:be is running
查看日志 tail -100f /var/log/supervisor/doris_be_alive_checker.log
进入web,查看,be已启动成功
感谢Doris社区|857社区伙伴 提供的建议