centos中使用slurm的sacct命令时由于卸载了mariadb而安装了mysql使得无法查看历史作业的解决过程

问题:想要使用sacct命令查看历史任务,一般来说都是使用scontrol来查看任务,但是它只能查看当前正在运行的任务,看不到历史的,而使用sacct查看时却报错:

sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to master:6819: Unable to connect to databas
sacct: error: slurmdbd: Sending PersistInit msg: No error
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to master:6819: Unable to connect to databas
sacct: error: slurmdbd: Sending PersistInit msg: No error
sacct: error: slurmdbd: DBD_GET_JOBS_COND failure: Unspecified error

这是因为在centos中安装mysql时卸载了自带的mariadb,而mariadb虽然是mysql开发过程中的分支,但是很多年过去了,mariadb有很多不兼容mysql的地方,所以只能将其删除。但是sacct命令就是去mariadb的slurm_acct_db(/etc/slurm/slurmdbd.conf,改不了)数据库中去找任务的信息,所以报错。
/etc/slurm/slurmdbd.conf这里面的配置如下:

DbdAddr=localhost
  DbdHost=localhost
  SlurmUser=slurm
  DebugLevel=4
  LogFile=/var/log/slurm/slurmdbd.log
  StorageType=accounting_storage/mysql
  StorageHost=localhost
  StoragePass=some_pass
  StorageUser=slurm
  StorageLoc=slurm_acct_db

这时有两种解决方案:
一、硬使mysql和mariadb共存
新开了个服务器,显然失败
二、在mysql中创建slurm_acct_db,骗过sacct(成功)
创建数据库时候还要注意一点,就是要给cloudam权限,以下:

create user 'slurm'@'localhost' identified by 'some_pass';
grant all on slurm_acct_db.* TO 'slurm'@'localhost';
SHOW ENGINES;
create database slurm_acct_db;
ALTER USER 'slurm'@'localhost' IDENTIFIED WITH mysql_native_password BY 'some_pass';
flush privileges;

但是数据库里的表是什么呢?也就是仅仅建了数据库之后,没有创建表,找不到表:

[2023-03-31T19:29:55.051] error: We should have gotten a new id: Table 'slurm_acct_db.hpccloudam_job_table' doesn't exist
[2023-03-31T19:29:55.051] error: It looks like the storage has gone away trying to reconnect
[2023-03-31T19:29:55.051] error: We should have gotten a new id: Table 'slurm_acct_db.hpccloudam_job_table' doesn't exist
[2023-03-31T19:29:55.051] error: couldn't add job 3 at step completion
[2023-03-31T19:29:56.613] error: Processing last message from connection 7(10.0.184.112) uid(1002)
[2023-03-31T19:29:59.153] error: mysql_query failed: 2006 MySQL server has gone away
insert into "hpccloudam_job_table" (id_job, mod_time, id_array_job, id_array_task, het_job_id, het_job_offset, id_assoc, id_qos, id_user, id_group, nodelist, id_resv, timelimit, time_eligible, time_submit, time_start, job_name, track_steps, state, priority, cpus_req, nodes_alloc, mem_req, flags, state_reason_prev, `partition`, node_inx, gres_req, gres_alloc, array_task_str, array_task_pending, tres_alloc, tres_req, work_dir) values (3, UNIX_TIMESTAMP(), 0, 4294967294, 0, 4294967294, 0, 1, 1003, 1003, 'g-v100-1-worker0001', 0, 60, 1680261895, 1680261895, 1680261986, 'struct_pred_esmfold.sh', 0, 3, 4294901758, 10, 1, 0, 8, 0, 'g-v100-1', '54000', 'gpu:1', 'gpu:1', NULL, 0, '1=10,3=18446744073709551614,4=1,5=10', '1=10,4=1,5=10', '/var/lib/mysql') on duplicate key update job_db_inx=LAST_INSERT_ID(job_db_inx), id_assoc=0, id_user=1003, id_group=1003, nodelist='g-v100-1-worker0001', id_resv=0, timelimit=60, time_submit=1680261895, time_eligible=1680261895, time_start=1680261986, mod_time=UNIX_TIMESTAMP(), job_name='struct_pred_esmfold.sh', track_steps=0, id_qos=1, state=greatest(state, 3), priority=4294901758, cpus_req=10, nodes_alloc=1, mem_req=0, id_array_job=0, id_array_task=4294967294, het_job_id=0, het_job_offset=4294967294, flags=8, state_reason_prev=0, `partition`='g-v100-1', node_inx='54000', gres_req='gpu:1', gres_alloc='gpu:1', array_task_str=NULL, array_task_pending=0, tres_alloc='1=10,3=18446744073709551614,4=1,5=10', tres_req='1=10,4=1,5=10', work_dir='/var/lib/mysql'
[2023-03-31T19:29:59.153] error: It looks like the storage has gone away trying to reconnect
[2023-03-31T19:29:59.153] error: mysql_real_connect failed: 2002 Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
[2023-03-31T19:29:59.153] error: unable to re-connect to as_mysql database
[2023-03-31T19:29:59.153] fatal: You haven't inited this storage yet.

因此就要找表,表结构在哪呢?
创建了个新服务器,直接去mariadb的slurm_acct_db数据库中找,但是这时要用navicat进行远程监控,来查看,所以要给mariadb配置远程:

[cloudam@master ~]$ mysql -u root -p
Enter password:(无密码)
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 63
Server version: 5.5.68-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| slurm_acct_db      |
| test               |
+--------------------+
5 rows in set (0.00 sec)

MariaDB [(none)]> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [mysql]> select host, user from user;
+-----------------+-------+
| host            | user  |
+-----------------+-------+
| 127.0.0.1       | root  |
| ::1             | root  |
| localhost       |       |
| localhost       | root  |
| localhost       | slurm |
| packer-632eeabe |       |
| packer-632eeabe | root  |
+-----------------+-------+
7 rows in set (0.00 sec)

MariaDB [mysql]> update user set host='%' where host='127.0.0.1';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [mysql]> select host, user from user;
+-----------------+-------+
| host            | user  |
+-----------------+-------+
| %               | root  |
| ::1             | root  |
| localhost       |       |
| localhost       | root  |
| localhost       | slurm |
| packer-632eeabe |       |
| packer-632eeabe | root  |
+-----------------+-------+
7 rows in set (0.00 sec)

MariaDB [mysql]> flush privileges;
Query OK, 0 rows affected (0.00 sec)

navicat连接mariadb,查看
在这里插入图片描述
这时候将mariadb中的表结构和数据同步到mysql那个机器上,就ok了!
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值