mysql 聚合两次,具有两个连接的查询中的mysql聚合函数会产生意外结果

最新推荐文章于 2022-01-22 22:53:25 发布

weixin_34701288

最新推荐文章于 2022-01-22 22:53:25 发布

阅读量114

点赞数

文章标签： mysql 聚合两次

Given the following (very simplified) mysql table structure:

products

product_categories

product_id

status (integer)

product_tags

product_id

some_other_numeric_value

I am trying to find every product that has an association to a certain product_tag, and that a relation to at least one category whichs status-attribute is 1.

I tried the following query:

SELECT *

FROM `product` p

JOIN `product_categories` pc

ON p.`product_id` = pc.`product_id`

JOIN `product_tags` pt

ON p.`product_id` = pt.`product_id`

WHERE pt.`some_value` = 'some comparison value'

GROUP BY p.`product_id`

HAVING SUM( pc.`status` ) > 0

ORDER BY SUM( pt.`some_other_numeric_value` ) DESC

Now my problem is: The SUM(pt.some_other_numeric_value) returns unexpected values.

I realized that if the product in question has more then one relation to the product_categories table, then every relation to the product_tags table is counted as many timed as there are relations to the product_categories table!

For example: If product with id=1 has a relation to product_categories with ids = 2, 3 and 4, and a relation with the product_tags with ids 5 and 6 - then if I insert a GROUP_CONCAT(pt.id), then it does give 5,6,5,6,5,6 instead of the expected 5,6.

At first I suspected it was a problem with the join type (left join, right join, inner join, and so on), so I tried every join type that I know of, but to no avail. I also tried to include more id-fields into the GROUP BY clause, but this didn´t solve the problem either.

Can somebody explain to me what is actually going wrong here?

解决方案

You join a "main" (product) table to two tables (tags and categories) via 1:n relationships, so this is expected, you are creating a mini cartesian product. For those products that have both more than one associated tags and more than one associated categories, multiple rows are created in the result set. If you Group By, you have wrong results in aggregate functions.

One way to avoid this is to remove one of the two joins, which is a valid startegy if you don't need results from that table. Say you don't need anything in the SELECT list from the product_categories table. Then you can use a semi-join (the EXISTS subquery)to that table:

SELECT p.*,

SUM( pt.`some_other_numeric_value` )

FROM `product` p

JOIN `product_tags` pt

ON p.`product_id` = pt.`product_id`

WHERE pt.`some_value` = 'some comparison value'

AND EXISTS

( SELECT *

FROM product_categories pc

WHERE pc.product_id = pc.product_id

AND pc.status = 1

)

GROUP BY p.`product_id`

ORDER BY SUM( pt.`some_other_numeric_value` ) DESC ;

Another way to circumvent this problem is - after the GROUP BY MainTable.pk - to use DISTINCT inside the COUNT() or GROUP_CONCAT() aggregate functions. This works but you can't use it with SUM(). So, it's not useful in your specific query.

A third option - which works always - is to first group by the two (or more) side tables and then join to the main table. Something like this in your case:

SELECT p.* ,

COALESCE(pt.sum_other_values, 0) AS sum_other_values

COALESCE(pt.cnt, 0) AS tags_count,

COALESCE(pc.cnt, 0) AS categories_count,

COALESCE(category_titles, '') AS category_titles

FROM `product` p

JOIN

( SELECT product_id

, COUNT(*) AS cnt

, GROUP_CONCAT(title) AS category_titles

FROM `product_categories` pc

WHERE status = 1