刚开始统计数据库中有多少行数据的时候,使用information_schema.TABLES去统计发现误差很大,官网的解释如下:
The number of rows. Some storage engines, such as MyISAM
, store the exact count. For other storage engines, such as InnoDB
, this value is an approximation, and may vary from the actual value by as much as 40% to 50%. In such cases, use SELECT COUNT(*)
to obtain an accurate count.
TABLE_ROWS
is NULL
for INFORMATION_SCHEMA
tables.
For InnoDB tables, the row count is only a rough estimate used in SQL optimization. (This is also true if the InnoDB table is partitioned.)
翻译过来的大概意思就是
数据表中的行数。一些存储引擎,如MyISAM,会存储准确的计数。对于其他存储引擎,例如InnoDB,这个值只是一个近似值,可能与实际值相差40%到50%。在这种情况下,使用SELECT COUNT(*)来获取准确的计数。对于INFORMATION_SCHEMA表,TABLE_ROWS为空。对于InnoDB表,行数只是SQL优化中使用的一个粗略估计。(如果InnoDB表是分区的,这也是正确的。)
我们一般都是用的InnoDB的存储引擎,这里关于引擎的选择暂时不表。
然后就想着通过count函数去统计每个表的数据量。一个个去写太麻烦了,就想着整个存储过程简化一下工作量。
首先创建个保存数据表行数的表,建表语句如下:
DROP TABLE IF EXISTS `statistics_lg`;
CREATE TABLE `statistics_lg` (
`id` int(0) NOT NULL AUTO_INCREMENT,
`schema_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`table_name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`total_num` bigint(0) NULL DEFAULT NULL,
`create_time` date NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 16 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
SET FOREIGN_KEY_CHECKS = 1;
其中有定义了两个IN参数,schema_name_lg 表示需要统计的数据库的名称,rows_limit表示超过多少行的表才去统计,因为有些垃圾表,或者数据量少的表就没必要每天去统计数据量了,这个可以动态调整的,我这边先设置了100.
CREATE DEFINER=`root`@`localhost` PROCEDURE `table_statistics`(in schema_name_lg VARCHAR(100),in rows_limit int)
BEGIN
-- 定义变量
DECLARE isFinished INT DEFAULT FALSE;
DECLARE total VARCHAR(1000) DEFAULT '' ;
DECLARE tablenamelg VARCHAR(1000);
-- 定义游标
DECLARE table_cursor CURSOR FOR SELECT TABLE_NAME FROM information_schema.`TABLES` WHERE TABLE_SCHEMA= schema_name_lg;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET isFinished=true;
-- 开启游标
OPEN table_cursor;
REPEAT
-- 游标中取到表名赋值给变量tablenamelg
FETCH table_cursor INTO tablenamelg;
IF isFinished = false THEN
SET @sqltext=concat('select count(1) into @total from ',schema_name_lg ,'.',tablenamelg,';');
PREPARE c_tab_stat from @sqltext;
EXECUTE c_tab_stat ;
SET total=@total;
END IF;
IF total>rows_limit THEN
insert into isc.statistics_lg (schema_name,table_name,total_num,create_time) VALUES (schema_name_lg,tablenamelg,total,CURRENT_DATE);
END IF;
UNTIL isFinished
END REPEAT;
CLOSE table_cursor;
END
这个是mysql8的语法,如果mysql5 执行报错的话,就需要先改一下结束符用//,之后再改过来用;
不然就会认为分号就是结束标记,下面的代码执行就报错了。
delimiter //
CREATE DEFINER=`root`@`localhost` PROCEDURE `table_statistics`(in schema_name_lg VARCHAR(100),in rows_limit int)
BEGIN
-- 定义变量
DECLARE isFinished INT DEFAULT FALSE;
DECLARE total VARCHAR(1000) DEFAULT '' ;
DECLARE tablenamelg VARCHAR(1000);
-- 定义游标
DECLARE table_cursor CURSOR FOR SELECT TABLE_NAME FROM information_schema.`TABLES` WHERE TABLE_SCHEMA= schema_name_lg;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET isFinished=true;
-- 开启游标
OPEN table_cursor;
REPEAT
-- 游标中取到表名赋值给变量tablenamelg
FETCH table_cursor INTO tablenamelg;
IF isFinished = false THEN
SET @sqltext=concat('select count(1) into @total from ',schema_name_lg ,'.',tablenamelg,';');
PREPARE c_tab_stat from @sqltext;
EXECUTE c_tab_stat ;
SET total=@total;
END IF;
IF total>rows_limit THEN
insert into isc.statistics_lg (schema_name,table_name,total_num,create_time) VALUES (schema_name_lg,tablenamelg,total,CURRENT_DATE);
END IF;
UNTIL isFinished
END REPEAT;
CLOSE table_cursor;
END //
delimiter ;
调用的话,就直接call table_statistics('test',100); 就把统计的数据insert到statistics_lg表了