Oracle取随机记录

http://www.eygle.com/archives/2004/11/get_random_output_of_recordset.html

从Oracle8i开始Oracle提供采样表扫描特性。

Oracle访问数据的基本方法有:
1.全表扫描
2.采样表扫描

全表扫描(Full table Scan)
全表扫描返回表中所有的记录。
执行全表扫描,Oracle读表中的所有记录,考查每一行是否满足WHERE条件。Oracle顺序的读分配给该表的每一个数据块,这样全表扫描能够受益于多块读.
每个数据块Oracle只读一次.

采样表扫描(sample table scan)
采样表扫描返回表中随机采样数据。
这种访问方式需要在FROM语句中包含SAMPLE选项或者SAMPLE BLOCK选项.

SAMPLE选项:
当按行采样来执行一个采样表扫描时,Oracle从表中读取特定百分比的记录,并判断是否满足WHERE子句以返回结果。

SAMPLE BLOCK选项:
使用此选项时,Oracle读取特定百分比的BLOCK,考查结果集是否满足WHERE条件以返回满足条件的纪录.

Sample_Percent:
Sample_Percent是一个数字,定义结果集中包含记录占总记录数量的百分比。
Sample值应该在[0.000001,99.999999]之间。

1.使用SAMPLE选项

SQL> select * from employee SAMPLE(30);

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=25 Bytes=2175)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=25 Bytes=2175)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
880 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed


SQL> select * from employee SAMPLE(20);

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=16 Bytes=1392)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=16 Bytes=1392)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
839 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed

2.使用SAMPLE BLOCK选项

 

SQL> SELECT * FROM employee SAMPLE BLOCK (50); 

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30

10 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=41 Bytes=3567)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=41 Bytes=3567)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1162 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed

SQL>

3.采样前n条记录的查询

也可以使用dbms_random包实现

 

SQL> select * from ( 
2 select * from employee
3 order by dbms_random.value )
4 where rownum <= 4;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7839 KING PRESIDENT 17-NOV-81 5000 10
7369 SMITH CLERK 7902 17-DEC-80 800 20
7788 SCOTT ANALYST 7566 19-APR-87 3000 20


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE
1 0 COUNT (STOPKEY)
2 1 VIEW
3 2 SORT (ORDER BY STOPKEY)
4 3 TABLE ACCESS (FULL) OF 'EMPLOYEE'




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
927 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
4 rows processed

对比一下SAMPLE选项

 

SQL> SELECT * FROM employee SAMPLE (40); 

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE (Cost=2 Card=33 Bytes=2871)
1 0 TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=33 Bytes=2871)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
961 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5 rows processed

SQL>

主要注意以下几点:

1.sample只对单表生效,不能用于表连接和远程表
2.sample会使SQL自动使用CBO


总结一下,随机取一定比例的记录有两个方法:
1)select * from tab sample(10) percent    ,按照10%比例获取记录
     缺点是每次的记录数并不是一样多,但可以用seed参数来控制下,效果还行。    
2)  select * from (select * from tab order by dbms_random.value ) where rownum
sample,有诸多限制。但比 order by dbms_random.value 效率好些,在某些场合还是不错的。

实例:
取10万验证过邮箱的用户:
SELECT END_USER_EMAIL
  FROM (SELECT E.END_USER_EMAIL
          FROM USER_TAB E
         WHERE E.IS_EMAIL_ACTIVATE = 1
         ORDER BY DBMS_RANDOM.VALUE)
 WHERE ROWNUM < 1000001;

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24387280/viewspace-682672/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/24387280/viewspace-682672/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值