需求和思路:已知了表的数据量,需要求表的每个字段的有值率(是否全为null),当数据很大的时候,直接去查全表会很慢,解决的直接思路是想去做抽样检测,
1,在Oracle中抽样
可以直接用sample函数,可以固定比例抽样,如下是抽0.333%的记录(注意一下这个值是个百分数,取值范围是0到100)
SELECT COUNT(XH),count(TC),count(JG),count(1)
FROM "WANGXINODS"."T_YJS_GY_XJ_STUDENT" SAMPLE (0.333)
2,在Mysql中抽样
参考知乎的问题:
在 MySQL 中,从 10 万条主键不连续的数据里随机取 3000 条
https://www.zhihu.com/question/20151242
如下两种方法都可以抽样,第一个可以固定抽10000条,适用于已知连续自增主键的情况,速度与抽取条数相关;(0.61s)
第二个可以抽固定比例的记录,适用不知道主键,或者主键不连续的情况,速度于抽取条数无关;(2.1s)
SELECT
count(id), count(uname), count(ucreatetime), count(age), count(1)
FROM
( select * from usertb
WHERE id >= (SELECT floor(RAND() * (SELECT MAX(id) FROM usertb)))
ORDER BY id LIMIT 10000
) tmp;
SELECT
count(id), count(uname), count(ucreatetime), count(age), count(1)
FROM
( select * from usertb
where rand() <= 0.001
) tmp;
3,生成测试数据,1000万条
为了方便测试,需要构造一些测试数据,方法参照
在mysql数据库中制作千万级测试表
https://www.cnblogs.com/qmfsun/p/4881919.html
4. oralce生成10万数据的测试表(千万的有点慢)
create table myTestTable as
select rownum as id,
to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime,
trunc(dbms_random.value(0, 100)) as random_id,
dbms_random.string('x', 20) random_string
from dual
connect by level <= 100000;
5. mysql生成10万数据测试表(带随机string和随机数值)
https://www.cnblogs.com/zxmbky/p/9567124.html
5.1 建表语句
CREATE TABLE `mytesttable` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`random_string` VARCHAR (20) NOT NULL,
`random_id` INT (11) NOT NULL,
`inc_datetime` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `index_id` (`id`) USING HASH
) ENGINE = INNODB AUTO_INCREMENT = 1 DEFAULT CHARSET = utf8
5.2 随机字串函数和存贮过程
CREATE FUNCTION `rand_string`(n INT) RETURNS varchar(255) CHARSET latin1
BEGIN
DECLARE chars_str varchar(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
DECLARE return_str varchar(255) DEFAULT '' ;
DECLARE i INT DEFAULT 0;
WHILE i < n DO
SET return_str = concat(return_str,substring(chars_str , FLOOR(1 + RAND()*62 ),1));
SET i = i +1;
END WHILE;
RETURN return_str;
END
CREATE PROCEDURE `add_mytesttable`(IN n int)
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= n ) DO
INSERT into mytesttable (random_string,random_id,inc_datetime )
VALUEs (rand_string(20),FLOOR(RAND() * 100) ,date_add(now(), interval i SECOND) );
set i=i+1;
END WHILE;
END
5.3 调用存储过程
生成1000条如下,要生成更多,自行修改即可
CALL add_mytesttable(1000)