http://www.eygle.com/archives/2004/11/get_random_output_of_recordset.html
从Oracle8i开始Oracle提供采样表扫描特性。
Oracle访问数据的基本方法有:
1.全表扫描
2.采样表扫描
全表扫描(Full table Scan)
全表扫描返回表中所有的记录。
执行全表扫描,Oracle读表中的所有记录,考查每一行是否满足WHERE条件。Oracle顺序的读分配给该表的每一个数据块,这样全表扫描能够受益于多块读.
每个数据块Oracle只读一次.
采样表扫描(sample table scan)
采样表扫描返回表中随机采样数据。
这种访问方式需要在FROM语句中包含SAMPLE选项或者SAMPLE BLOCK选项.
SAMPLE选项:
当按行采样来执行一个采样表扫描时,Oracle从表中读取特定百分比的记录,并判断是否满足WHERE子句以返回结果。
SAMPLE BLOCK选项:
使用此选项时,Oracle读取特定百分比的BLOCK,考查结果集是否满足WHERE条件以返回满足条件的纪录.
Sample_Percent:
Sample_Percent是一个数字,定义结果集中包含记录占总记录数量的百分比。
Sample值应该在[0.000001,99.999999]之间。
1.使用SAMPLE选项
SQL> select * from employee SAMPLE(30);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- 7369 SMITH CLERK 7902 17-DEC-80 800 20 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 7839 KING PRESIDENT 17-NOV-81 5000 10
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=25 Bytes=2175) 1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=25 Bytes=2175)
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 880 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed
SQL> select * from employee SAMPLE(20);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=16 Bytes=1392) 1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=16 Bytes=1392)
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 839 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed
|
2.使用SAMPLE BLOCK选项
SQL> SELECT * FROM employee SAMPLE BLOCK (50);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- 7369 SMITH CLERK 7902 17-DEC-80 800 20 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7566 JONES MANAGER 7839 02-APR-81 2975 20 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 7839 KING PRESIDENT 17-NOV-81 5000 10 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
10 rows selected.
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=41 Bytes=3567) 1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=41 Bytes=3567)
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 1162 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
SQL>
|
3.采样前n条记录的查询
也可以使用dbms_random包实现
SQL> select * from ( 2 select * from employee 3 order by dbms_random.value ) 4 where rownum <= 4;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 7839 KING PRESIDENT 17-NOV-81 5000 10 7369 SMITH CLERK 7902 17-DEC-80 800 20 7788 SCOTT ANALYST 7566 19-APR-87 3000 20
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT ptimizer=CHOOSE 1 0 COUNT (STOPKEY) 2 1 VIEW 3 2 SORT (ORDER BY STOPKEY) 4 3 TABLE ACCESS (FULL) OF 'EMPLOYEE'
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 927 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 4 rows processed
|
对比一下SAMPLE选项
SQL> SELECT * FROM employee SAMPLE (40);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7839 KING PRESIDENT 17-NOV-81 5000 10 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=33 Bytes=2871) 1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=33 Bytes=2871)
Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 961 bytes sent via SQL*Net to client 503 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed
SQL>
|
主要注意以下几点:
1.sample只对单表生效,不能用于表连接和远程表
2.sample会使SQL自动使用CBO
总结一下,随机取一定比例的记录有两个方法:
1)select * from tab sample(10) percent ,按照10%比例获取记录
缺点是每次的记录数并不是一样多,但可以用seed参数来控制下,效果还行。
2) select * from (select * from tab order by dbms_random.value ) where rownum
sample,有诸多限制。但比 order by dbms_random.value 效率好些,在某些场合还是不错的。
实例:
取10万验证过邮箱的用户:
SELECT END_USER_EMAIL
FROM (SELECT E.END_USER_EMAIL
FROM USER_TAB E
WHERE E.IS_EMAIL_ACTIVATE = 1
ORDER BY DBMS_RANDOM.VALUE)
WHERE ROWNUM < 1000001;
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24387280/viewspace-682672/,如需转载,请注明出处,否则将追究法律责任。