mysql 聚合两次,具有两个连接的查询中的mysql聚合函数会产生意外结果

Given the following (very simplified) mysql table structure:

products

id

product_categories

id

product_id

status (integer)

product_tags

id

product_id

some_other_numeric_value

I am trying to find every product that has an association to a certain product_tag, and that a relation to at least one category whichs status-attribute is 1.

I tried the following query:

SELECT *

FROM `product` p

JOIN `product_categories` pc

ON p.`product_id` = pc.`product_id`

JOIN `product_tags` pt

ON p.`product_id` = pt.`product_id`

WHERE pt.`some_value` = 'some comparison value'

GROUP BY p.`product_id`

HAVING SUM( pc.`status` ) > 0

ORDER BY SUM( pt.`some_other_numeric_value` ) DESC

Now my problem is: The SUM(pt.some_other_numeric_value) returns unexpected values.

I realized that if the product in question has more then one relation to the product_categories table, then every relation to the product_tags table is counted as many timed as there are relations to the product_categories table!

For example: If product with id=1 has a relation to product_categories with ids = 2, 3 and 4, and a relation with the product_tags with ids 5 and 6 - then if I insert a GROUP_CONCAT(pt.id), then it does give 5,6,5,6,5,6 instead of the expected 5,6.

At first I suspected it was a problem with the join type (left join, right join, inner join, and so on), so I tried every join type that I know of, but to no avail. I also tried to include more id-fields into the GROUP BY clause, but this didn´t solve the problem either.

Can somebody explain to me what is actually going wrong here?

解决方案

You join a "main" (product) table to two tables (tags and categories) via 1:n relationships, so this is expected, you are creating a mini cartesian product. For those products that have both more than one associated tags and more than one associated categories, multiple rows are created in the result set. If you Group By, you have wrong results in aggregate functions.

One way to avoid this is to remove one of the two joins, which is a valid startegy if you don't need results from that table. Say you don't need anything in the SELECT list from the product_categories table. Then you can use a semi-join (the EXISTS subquery)to that table:

SELECT p.*,

SUM( pt.`some_other_numeric_value` )

FROM `product` p

JOIN `product_tags` pt

ON p.`product_id` = pt.`product_id`

WHERE pt.`some_value` = 'some comparison value'

AND EXISTS

( SELECT *

FROM product_categories pc

WHERE pc.product_id = pc.product_id

AND pc.status = 1

)

GROUP BY p.`product_id`

ORDER BY SUM( pt.`some_other_numeric_value` ) DESC ;

Another way to circumvent this problem is - after the GROUP BY MainTable.pk - to use DISTINCT inside the COUNT() or GROUP_CONCAT() aggregate functions. This works but you can't use it with SUM(). So, it's not useful in your specific query.

A third option - which works always - is to first group by the two (or more) side tables and then join to the main table. Something like this in your case:

SELECT p.* ,

COALESCE(pt.sum_other_values, 0) AS sum_other_values

COALESCE(pt.cnt, 0) AS tags_count,

COALESCE(pc.cnt, 0) AS categories_count,

COALESCE(category_titles, '') AS category_titles

FROM `product` p

JOIN

( SELECT product_id

, COUNT(*) AS cnt

, GROUP_CONCAT(title) AS category_titles

FROM `product_categories` pc

WHERE status = 1

GROUP BY product_id

) AS pc

ON p.`product_id` = pc.`product_id`

JOIN

( SELECT product_id

, COUNT(*) AS cnt

, SUM(some_other_numeric_value) AS sum_other_values

FROM `product_tags` pt

WHERE some_value = 'some comparison value'

GROUP BY product_id

) AS pt

ON p.`product_id` = pt.`product_id`

ORDER BY sum_other_values DESC ;

The COALESCE() are not strictly needed there - just in case you chnage the inner joins to LEFT outer joins.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值