postgresql中使用distinct去重

select语法

[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
    [ * | expression [ [ AS ] output_name ] [, ...] ]
    [ FROM from_item [, ...] ]
    [ WHERE condition ]
    [ GROUP BY grouping_element [, ...] ]
    [ HAVING condition [, ...] ]
    [ WINDOW window_name AS ( window_definition ) [, ...] ]
    [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
    [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
    [ LIMIT { count | ALL } ]
    [ OFFSET start [ ROW | ROWS ] ]
    [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
    [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]

where from_item can be one of:

    [ ONLY ] table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
                [ TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ] ]
    [ LATERAL ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ]
    with_query_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
    [ LATERAL ] function_name ( [ argument [, ...] ] )
                [ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
    [ LATERAL ] function_name ( [ argument [, ...] ] ) [ AS ] alias ( column_definition [, ...] )
    [ LATERAL ] function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] )
    [ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS ( column_definition [, ...] ) ] [, ...] )
                [ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
    from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]

and grouping_element can be one of:

    ( )
    expression
    ( expression [, ...] )
    ROLLUP ( { expression | ( expression [, ...] ) } [, ...] )
    CUBE ( { expression | ( expression [, ...] ) } [, ...] )
    GROUPING SETS ( grouping_element [, ...] )

and with_query is:

    with_query_name [ ( column_name [, ...] ) ] AS ( select | values | insert | update | delete )

TABLE [ ONLY ] table_name [ * ]

数据

INSERT INTO "test_dist" VALUES (1, '1', 'a');
INSERT INTO "test_dist" VALUES (2, '1', 'b');
INSERT INTO "test_dist" VALUES (3, '1', 'c');
INSERT INTO "test_dist" VALUES (4, '2', 'm');
INSERT INTO "test_dist" VALUES (5, '2', 'n');
INSERT INTO "test_dist" VALUES (6, '3', 'j');
INSERT INTO "test_dist" VALUES (7, '3', 'j');
INSERT INTO "test_dist" VALUES (8, '4', 'j');

去重多个列

直接用distinct,后面的列都参与去重。只有code, name拼接的组合相同时,去掉重复的

# SELECT 
DISTINCT code,name
from test_dist;
 code | name
------+------
 1    | b
 2    | n
 4    | j
 1    | c
 1    | a
 2    | m
 3    | j
(7 rows)

去重指定列,保留其他列

当下遇到需求,需要将其中一个列去重,然后其他列随机取出就可以了。造成这种需求的原因是单表设计不合理,没有拆分成多表,造成多字段冗余,除了唯一性标志外,其他字段是相同的。目标是,取出其他字段,忽略唯一标志。

因为其他字段有重复,需要去掉重复。

# SELECT 
DISTINCT ON (code) code,
id, name
from test_dist;
 code | id | name
------+----+------
 1    |  1 | a
 2    |  4 | m
 3    |  6 | j
 4    |  8 | j
(4 rows)

这里,根据code去重,id和name随机取出,这样可以获得code维度的数据。如果不去重,获得原始数据,code有重复。

### 如何在 SQL 查询中使用 `DISTINCT` 后再进行 `SUM` 求和 为了实现先再求和的功能,在 SQL 中可以采用子查询的方式。具体来说,可以在内层查询中通过 `SELECT DISTINCT` 来消除复项,然后再在外层查询中应用聚合函数如 `SUM()` 对这些唯一值做加总。 下面给出一个具体的例子来展示这一过程: 假设有一个名为 `sales` 的销售记录表,其中包含字段 `product_id`, `sale_amount` 和其他一些列。如果想要计算每种产品不考虑多次购买情况下的总销售额,则可按照如下方式构建查询语句: ```sql -- 计算不同 product_id 下 sale_amount 总额 SELECT SUM(sale_amount) as total_sales FROM ( SELECT DISTINCT product_id, sale_amount FROM sales ) t; ``` 此方法确保了对于每一个 `product_id` 只会选取一次对应的 `sale_amount` 进行累加操作[^1]。 另外一种场景是在更复杂的环境中处理多列组合的情况,比如当存在多个维度需要同时考虑时。此时同样可以通过嵌套查询的形式完成任务。例如给定的数据集中有月份 (`mth`)、客户编号(`s_kh`)以及交易金额(`s_sdje`)三个要属性,目标是按月统计复客户的消费总额: ```sql -- 统计每个月份下复后的客户消费总额 SELECT mth, SUM(je) AS je, COUNT(DISTINCT s_kh) sl FROM ( SELECT DATE_FORMAT(fi.ts_createtime,'%m') mth, fi.s_sdje, CASE WHEN (fi.s_kh IS NULL OR fi.s_kh='') AND (fi.s_zh IS NOT NULL AND fi.s_zh<>'') THEN fi.s_zh ELSE fi.s_kh END s_kh FROM spp_fb_freeze_info fi LEFT JOIN spp_req_basic_info bi ON fi.s_bizid=bi.s_id WHERE fi.c_zxjg='0' AND bi.c_qqcslx IN('05','06') AND bi.c_biztype='40' AND YEAR(bi.ts_createtime)=2023 ) b GROUP BY mth; ``` 上述代码片段展示了如何在一个较为复杂的情况下利用 `CASE` 结构调整某些特定条件下使用的键值,并最终实现了基于时间周期内的汇总分析功能[^2]。 #### 使用 PostgreSQL 实现相同逻辑 考虑到不同的数据库管理系统可能具有略微差异化的语法支持,在 PostgreSQL 数据库环境下执行类似的查询也可以遵循相同的思路。这里提供了一个简单的实例用于说明如何针对用户得分表(users)中的分数(score),依据队伍(team)来进行并求和的操作: ```sql CREATE TABLE users( id SERIAL PRIMARY KEY, player VARCHAR(255) NOT NULL, score REAL, team VARCHAR(255) ); INSERT INTO users(player,score,team) VALUES ('库里',28.3,'勇士'), ('哈登',30.2,'火箭'), ('阿杜',25.6,'勇士'), ('阿詹',27.8,'骑士'), ('神龟',31.3,'雷霆'), ('白边',19.8,'热火'); -- 针对 teams 列进行分组并对 scores 应用 distinct 后求和 SELECT team, SUM(DISTINCT score) as unique_score_sum FROM users GROUP BY team; ``` 这段脚本首先定义了一张存储球员信息及其所属球队成绩的表格,接着插入了几条测试数据;最后部分则完成了根据队伍名称分组的同时只保留各队成员间互异的成绩数值之和的任务[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值