Doris 添加order by后,默认加上了 limit 65535
执行explain SELECT year, month, std_province, std_city, std_brand_name, SUM(p_count) AS num FROM m_allweb_week_sales_h_agg GROUP BY year, month, std_province, std_city, std_brand_name
获得如下结果:
PLAN FRAGMENT 0
OUTPUT EXPRS:<slot 6> | <slot 7> | <slot 8> | <slot 9> | <slot 10> | <slot 11> |
PARTITION: UNPARTITIONED
RESULT SINK
4:EXCHANGE
tuple ids: 1
PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
STREAM DATA SINK
EXCHANGE ID: 04
UNPARTITIONED
3:AGGREGATE (merge finalize)
| output: sum(<slot 11>)
| group by: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
| tuple ids: 1
|
2:EXCHANGE
tuple ids: 1
PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 02
HASH_PARTITIONED: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
1:AGGREGATE (update serialize)
| output: sum(p_count)
| group by: year, month, std_province, std_city, std_brand_name
| tuple ids: 1
|
0:OlapScanNode
TABLE: m_allweb_week_sales_h_agg
PREAGGREGATION: ON
partitions=41/41
rollup: grab_car_sum_total
buckets=1312/1312
tuple ids: 0
执行explain SELECT year, month, std_province, std_city, std_brand_name, SUM(p_count) AS num FROM m_allweb_week_sales_h_agg GROUP BY year, month, std_province, std_city, std_brand_name order by year, month;
获得如下结果:
PLAN FRAGMENT 0
OUTPUT EXPRS:<slot 12> | <slot 13> | <slot 14> | <slot 15> | <slot 16> | <slot 17> |
PARTITION: UNPARTITIONED
RESULT SINK
5:MERGING-EXCHANGE
limit: 65535
tuple ids: 2
PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
STREAM DATA SINK
EXCHANGE ID: 05
UNPARTITIONED
2:TOP-N
| order by: <slot 12> ASC, <slot 13> ASC
| offset: 0
| limit: 65535
| tuple ids: 2
|
4:AGGREGATE (merge finalize)
| output: sum(<slot 11>)
| group by: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
| tuple ids: 1
|
3:EXCHANGE
tuple ids: 1
PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
1:AGGREGATE (update serialize)
| output: sum(p_count)
| group by: year, month, std_province, std_city, std_brand_name
| tuple ids: 1
|
0:OlapScanNode
TABLE: m_allweb_week_sales_h_agg
PREAGGREGATION: ON
partitions=41/41
rollup: grab_car_sum_total
buckets=1312/1312
tuple ids: 0
莫名其妙加上了一个limit限制。
我手动修改limit
explain SELECT year, month, std_province, std_city, std_brand_name, SUM(p_count) AS num FROM m_allweb_week_sales_h_agg GROUP BY year, month, std_province, std_city, std_brand_name order by year, month limit 100000000;
结果如下:
PLAN FRAGMENT 0
OUTPUT EXPRS:<slot 12> | <slot 13> | <slot 14> | <slot 15> | <slot 16> | <slot 17> |
PARTITION: UNPARTITIONED
RESULT SINK
5:MERGING-EXCHANGE
limit: 100000000
tuple ids: 2
PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
STREAM DATA SINK
EXCHANGE ID: 05
UNPARTITIONED
2:TOP-N
| order by: <slot 12> ASC, <slot 13> ASC
| offset: 0
| limit: 100000000
| tuple ids: 2
|
4:AGGREGATE (merge finalize)
| output: sum(<slot 11>)
| group by: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
| tuple ids: 1
|
3:EXCHANGE
tuple ids: 1
PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: RANDOM
STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: <slot 6>, <slot 7>, <slot 8>, <slot 9>, <slot 10>
1:AGGREGATE (update serialize)
| output: sum(p_count)
| group by: year, month, std_province, std_city, std_brand_name
| tuple ids: 1
|
0:OlapScanNode
TABLE: m_allweb_week_sales_h_agg
PREAGGREGATION: ON
partitions=41/41
rollup: grab_car_sum_total
buckets=1312/1312
tuple ids: 0
最终结果导致我查询到的数据不完整。
Palo会默认的为order by加上limit65535,如果想要order by更多的数据可以手动写一个order by 更大的数字。
这里我们处理的不是很好,后续我们会去掉这个选项
文章来源 :https://github.com/apache/doris/issues/219