I've got a problem that I can't seem to figure out after a bunch of failed attempts.
I've got three tables that I need to do a join on for some reporting, and in the 3rd table a record might not exist. But if the record in the 3rd table doesn't exist, I need to report a null value for the data that comes from the 3rd table and get all records that match the other conditions.
Stripped down to the relevant columns, here are the table structures:
members - this table holds all members that register to a website
| memberId | insertDate |
| ==========|=====================|
| 1 | 2013-08-01 18:18:16 |
| 2 | 2013-08-02 18:18:16 |
| 3 | 2013-08-03 18:18:16 |
| 4 | 2013-08-04 18:18:16 |
| 5 | 2013-08-05 18:18:16 |
registration_steps - this table holds the progress of the registration processes and whether the registration was completed or not
| memberId | completed |
| ==========|===========|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 0 |
| 5 | 1 |
purchases - this table holds, well.. purchases
| memberId | insertDate |
| ==========|=====================|
| 1 | 2013-08-02 18:18:16 |
| 1 | 2013-08-03 17:18:16 |
| 1 | 2013-08-03 18:18:16 |
| 5 | 2013-08-07 18:18:16 |
This is the query I've come up with so far:
SELECT `m`.`memberId`,
DATE(`m`.`insertDate`) AS `regDate`,
COUNT(`p`.`memberId`) AS `totalTransactions`,
DATE(MIN(`p`.`insertDate`)) AS `firstPurchaseDate`,
DATE(MAX(`p`.`insertDate`)) AS `latestPurchaseDate`,
DATEDIFF(DATE(MIN(`p`.`insertDate`)), DATE(`m`.`insertDate`)) AS `daysBetweenRegAndFirstPurchase`
FROM `db`.`members` `m`
INNER JOIN `db`.`registration_steps` `r` ON `m`.`memberId` = `r`.`memberId`
INNER JOIN `db`.`purchases` `p` ON `m`.`memberId` = `p`.`memberId`
WHERE `m`.`insertDate` BETWEEN '2013-07-01 00:00:00' AND '2013-07-31 23:59:59'
AND `r`.`completed` = 1
GROUP BY `m`.`memberId`
;
It shows me everything I want but the members with a missing record in table purchases.
Here is what I get:
| memberId | regDate | totalTransactions | firstPurchaseDate | latestPurchaseDate | daysBetweenRegAndFirstPurchase |
| ==========|=====================|===================|=====================|=====================|================================|
| 1 | 2013-08-01 18:18:16 | 3 | 2013-08-02 18:18:16 | 2013-08-03 18:18:16 | 1 |
| 5 | 2013-08-05 18:18:16 | 1 | 2013-08-07 18:18:16 | 2013-08-07 18:18:16 | 2 |
But what I need is:
| memberId | regDate | totalTransactions | firstPurchaseDate | latestPurchaseDate | daysBetweenRegAndFirstPurchase |
| ==========|=====================|===================|=====================|=====================|================================|
| 1 | 2013-08-01 18:18:16 | 3 | 2013-08-02 18:18:16 | 2013-08-03 18:18:16 | 1 |
| 2 | 2013-08-02 18:18:16 | 0 | NULL | NULL | -1 |
| 3 | 2013-08-03 18:18:16 | 0 | NULL | NULL | -1 |
| 5 | 2013-08-05 18:18:16 | 1 | 2013-08-07 18:18:16 | 2013-08-07 18:18:16 | 2 |
In order to achieve this, I tried to change the second inner join to a left join, a left outer join and put the where conditions to the first inner join condition. However, I wasn't able to get the desired result. (Must admit I interupted a few VERY long running queries that might have been correct(?) though (total count for members in real scenario is about 20k).)
Anyone?
Thanks in advance!
解决方案
To get the all the result from the member table you need to left join other table and add group by for the member table.
Sample
FROM `db`.`members` `m`
LEFT OUTER JOIN `db`.`registration_steps` `r` ON `m`.`memberId` = `r`.`memberId`
LEFT OUTER JOIN `db`.`purchases` `p` ON `m`.`memberId` = `p`.`memberId`
LEFT [ OUTER ]
Specifies that all rows from the left table not meeting the join condition are included in the result set, and output columns from the other table are set to NULL in addition to all rows returned by the inner join.