Teradata 之top n与sample n

Teradata取n条样本数据的方法有两种:

select top x * from table;
select * from table sample n;

那么二者有什么区别呢?说明如下:

TOP N 
如果有Order By关键字首先要对数据库的数据进行排序,然后获取N条数据或者抽样比率为N;
如果没有Order By关键字,要做一次STAT FUNCTION的全AMP收集,然后选择某一个或者几个AMP 来抽取数据。

Sample N 
首先要对数据库的数据进行全表扫描,然后获取N数据;
采用的是Sampling的形式。

[@more@]

性能比较:
当数据量比较小的时候,TOP N的速度要比Sample的速度快;
当数据量比较大的时候,Sample N的速度要比TOP的速度快。

测试:

Explain select top 10 * from PD_PORTAL.TOPIC_COMP_DETAIL

1) First, we lock a distinct PD_PORTAL."pseudo table" for read on a
RowHash to prevent global deadlock for PD_PORTAL.TOPIC_COMP_DETAIL. 
2) Next, we lock PD_PORTAL.TOPIC_COMP_DETAIL for read. 
3) We do an all-AMPs STAT FUNCTION step from
PD_PORTAL.TOPIC_COMP_DETAIL by way of an all-rows scan with no
residual conditions into Spool 5, which is redistributed by hash
code to all AMPs. The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs. This step is
used to retrieve the TOP 10 rows. One AMP is randomly selected to
retrieve 10 rows. If this step retrieves less than 10 rows, then
execute step 4. The size is estimated with low confidence to be
10 rows (27,460 bytes). 
4) We do an all-AMPs STAT FUNCTION step from
PD_PORTAL.TOPIC_COMP_DETAIL by way of an all-rows scan with no
residual conditions into Spool 5 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 1 (group_amps), which is built locally on the AMPs. 
This step is used to retrieve the TOP 10 rows. The size is
estimated with low confidence to be 10 rows (27,460 bytes). 
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

Explain select * from PD_PORTAL.TOPIC_COMP_DETAIL sample 10

1) First, we lock a distinct PD_PORTAL."pseudo table" for read on a
RowHash to prevent global deadlock for PD_PORTAL.TOPIC_COMP_DETAIL. 
2) Next, we lock PD_PORTAL.TOPIC_COMP_DETAIL for read. 
3) We do an all-AMPs SAMPLING step from PD_PORTAL.TOPIC_COMP_DETAIL
by way of an all-rows scan with no residual conditions into Spool
1 (group_amps), which is built locally on the AMPs. Samples are
specified as a number of rows. 
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

转载于:https://my.oschina.net/u/2381604/blog/741604

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值