一、数据全部重复
例如:
name | score |
---|---|
Computer | 1600 |
Phone | 12 |
Phone | 12 |
操作步骤:
1.复制表结构
CREATE TABLE <new_table> LIKE <old_table>;
2.插入去重后的数据
insert overwrite table
<new_table> select distinct * from <old_table> ;
ps:有时执行这个语句会报以下错误:
FAILED: SemanticException TOK_ALLCOLREF is not supported in current context
写上所有列名就好了:
insert overwrite table
<new_table> select distinct name, score from <old_table> ;
二、部分数据重复
例如:
name | score | type |
---|---|---|
Computer | 1600 | 2 |
Phone | 12 | 1 |
Phone | 15 | 1 |
操作步骤:
1.复制表结构
CREATE TABLE <new_table> LIKE <old_table>;
2.插入去重后的数据
insert overwrite table <new_table>(
select t.name, t.score, t.type
from (
select
name, score, ,type, row_number() over(distribute by name sort by score ) as rn
from <old_table>
) t where t.rn=1
);
3.总结一下就是:
insert overwrite table <new_table> (
select <字段>
from (
select <字段>, row_number() over(distribute by <有重复的字段> sort by <重复字段的排列根据字段>) as rn
from <old_table>
) t where t.rn=1
);