1、故障前操作
旧gitlab15.7.2-->新gitlab15.7.2-->升级至gitlab16.3.6,仅恢复数据,未恢复gitlab-secrets.json文件及其他。
2、故障现象
Runner界面500错误,项目中涉及cicd的功能均500错误。其余代码功能正常。
3、故障排查
3.1尝试步骤一
首先导入原来的加密信息/etc/gitlab/gitlab-secrets.json,并进行以下操作:
# 官方说明,在操作后可能会将配置文件恢复为默认配置,建议操作前备份
~] gitlab-rails console
> ApplicationSetting.first.delete
> ApplicationSetting.first
=> nill
# 执行后重载配置文件
~] gitlab-ctl reconfigure
此时Runner界面可操作。但cicd相关环境变量仍提示无法读取,也无法新增,报错500,提交cicd相关文件也500。
3.2尝试步骤二
输入如下命令等待进入rails控制台
gitlab-rails console
输入如下命令
ApplicationSetting.current.reset_runners_registration_token!
提示错误,OpenSSL::Cipher::CipherError。
3.3尝试步骤三
验证密钥解密情况
gitlab-rake gitlab:doctor:secrets VERBOSE=1
root@github:~# gitlab-rake gitlab:doctor:secrets VERBOSE=1
I, [2023-11-02T01:44:39.003624 #75277] INFO -- : Checking encrypted values in the database
I, [2023-11-02T01:44:47.708737 #75277] INFO -- : - Ci::InstanceVariable failures: 7
D, [2023-11-02T01:44:47.708838 #75277] DEBUG -- : - Ci::InstanceVariable[15]: value
D, [2023-11-02T01:44:47.708878 #75277] DEBUG -- : - Ci::InstanceVariable[16]: value
D, [2023-11-02T01:44:47.708912 #75277] DEBUG -- : - Ci::InstanceVariable[17]: value
D, [2023-11-02T01:44:47.708944 #75277] DEBUG -- : - Ci::InstanceVariable[18]: value
D, [2023-11-02T01:44:47.708976 #75277] DEBUG -- : - Ci::InstanceVariable[19]: value
D, [2023-11-02T01:44:47.709008 #75277] DEBUG -- : - Ci::InstanceVariable[20]: value
D, [2023-11-02T01:44:47.709038 #75277] DEBUG -- : - Ci::InstanceVariable[21]: value
I, [2023-11-02T01:44:47.840788 #75277] INFO -- : - Ci::Variable failures: 1
D, [2023-11-02T01:44:47.840887 #75277] DEBUG -- : - Ci::Variable[43]: value
I, [2023-11-02T01:44:48.518034 #75277] INFO -- : - ApplicationSetting failures: 1
D, [2023-11-02T01:44:48.518131 #75277] DEBUG -- : - ApplicationSetting[1]: runners_registration_token, error_tracking_access_token
I, [2023-11-02T01:44:53.861818 #75277] INFO -- : - Project failures: 2
D, [2023-11-02T01:44:53.861916 #75277] DEBUG -- : - Project[5]: runners_token
D, [2023-11-02T01:44:53.861949 #75277] DEBUG -- : - Project[32]: runners_token
I, [2023-11-02T01:44:53.912120 #75277] INFO -- : Total: 11 row(s) affected
I, [2023-11-02T01:44:53.912141 #75277] INFO -- : Done!
可见有11处不正常。
4、排障过程
4.1 Ci::InstanceVariable failures的问题
gitlab-rails console
Ci::InstanceVariable.find(15) //确认问题,16-21同样
Ci::InstanceVariable.delete(15) //删除对应值,16-21同样
4.2 Ci::Variable failures的问题
Ci::Variable.find(43)//确认问题
Ci::Variable.delete(43)//删除对应值
exit
4.3 ApplicationSetting的问题
gitlab-rails dbconsole --database main
SELECT * FROM application_settings;
UPDATE application_settings SET runners_registration_token = null;
UPDATE application_settings SET error_tracking_access_token_encrypted = null;//此字段需要结合select的结果查找。
4.4 Project failures的问题
UPDATE projects SET runners_token = null, runners_token_encrypted = null;
4.5其他操作
-- 删除所有 CI/CD 记录
DELETE FROM ci_group_variables;
DELETE FROM ci_variables;
-- Clear group tokensUPDATE namespaces SET runners_token = null, runners_token_encrypted = null;
-- Clear instance tokens
UPDATE application_settings SET runners_registration_token_encrypted = null;
-- Clear key used for JWT authentication
-- This may break the $CI_JWT_TOKEN job variable:
-- https://gitlab.com/gitlab-org/gitlab/-/issues/325965
UPDATE application_settings SET encrypted_ci_jwt_signing_key = null;
-- Clear runner tokens
UPDATE ci_runners SET token = null, token_encrypted = null;
-- Clear build tokens
UPDATE ci_builds SET token = null, token_encrypted = null;
-- truncate web_hooks table
TRUNCATE web_hooks CASCADE;
4.6 重新验证
再次查看是否还有密钥引发的故障
root@github:~# gitlab-rake gitlab:doctor:secrets VERBOSE=1
I, [2023-11-02T02:14:40.773310 #76309] INFO -- : Checking encrypted values in the database
I, [2023-11-02T02:14:50.677954 #76309] INFO -- : Total: 0 row(s) affected
I, [2023-11-02T02:14:50.677977 #76309] INFO -- : Done!
再无错误。删除的环境变量需要重新添加,其余故障均已排除。