员工列表
员工ID | 姓名 | 部门 | 月薪 |
1 | 李彦宏 | 销售部1 | 3000 |
2 | 周鸿祎 | 销售部1 | 3500 |
3 | 雷军 | 销售部2 | 2000 |
4 | 贾跃亭 | 销售部2 | 3100 |
5 | 董明珠 | 销售部3 | 4200 |
6 | 冯仑 | 销售部3 | 4100 |
7 | 马云 | 销售部3 | 3900 |
员工业绩单:
订单号 | 员工ID | 订单业绩 |
1 | 2 | 1000 |
2 | 2 | 1100 |
3 | 4 | 1500 |
4 | 4 | 1800 |
5 | 4 | 2000 |
6 | 7 | 1500 |
7 | 7 | 1400 |
8 | 7 | 1600 |
其中两个表是通过员工ID来进行关联的,现在需要统计:
- 各个部门的总月薪
- 各个部门的总订单数
- 各个部门总月薪数的产出:订单总额/月薪
通过Tableau关联
发现统计的总月薪数不对:
问题分析
我们看到表关联之后两个表之间是对应1对多的关系,关联之后,出现了笛卡尔积:
mysql> SELECT employees.employee_id, employees.name, employees.salary, employees.employee_id, orders.order_id FROM employees LEFT JOIN orders ON employees.employee_id=orders.employee_id;
+-------------+-----------+--------+-------------+----------+
| employee_id | name | salary | employee_id | order_id |
+-------------+-----------+--------+-------------+----------+
| 2 | 周鸿祎 | 3500 | 2 | 1 |
| 2 | 周鸿祎 | 3500 | 2 | 2 |
| 4 | 贾跃亭 | 3100 | 4 | 3 |
| 4 | 贾跃亭 | 3100 | 4 | 4 |
| 4 | 贾跃亭 | 3100 | 4 | 5 |
| 7 | 马云 | 3900 | 7 | 6 |
| 7 | 马云 | 3900 | 7 | 7 |
| 7 | 马云 | 3900 | 7 | 8 |
| 1 | 李彦宏 | 3000 | 1 | NULL |
| 3 | 雷军 | 2000 | 3 | NULL |
| 5 | 董明珠 | 4200 | 5 | NULL |
| 6 | 冯仑 | 4100 | 6 | NULL |
+-------------+-----------+--------+-------------+----------+
12 rows in set (0.00 sec) |
因此在Tableau做聚合统计的时候,sum(salary)直接用了"LEFT JOIN"之后的表,其查询语句如下:
mysql> SELECT employees.employee_id, employees.name, sum(employees.salary), count(employees.employee_id) FROM employees LEFT JOIN orders ON employees.employee_id=orders.employee_id GROUP BY employees.employee_id;
+-------------+-----------+-----------------------+------------------------------+
| employee_id | name | sum(employees.salary) | count(employees.employee_id) |
+-------------+-----------+-----------------------+------------------------------+
| 1 | 李彦宏 | 3000 | 1 |
| 2 | 周鸿祎 | 7000 | 2 |
| 3 | 雷军 | 2000 | 1 |
| 4 | 贾跃亭 | 9300 | 3 |
| 5 | 董明珠 | 4200 | 1 |
| 6 | 冯仑 | 4100 | 1 |
| 7 | 马云 | 11700 | 3 |
+-------------+-----------+-----------------------+------------------------------+ |
解决方案
为避免出现JOIN之后出现salary重复,不妨先在orders表里先算出,我们来回顾一下需求:
- 各个部门的总月薪
- 各个部门的总订单数
- 各个部门总月薪数的产出:订单总额/月薪
订单数和订单总额都可以在orders单表里查询出:
mysql> SELECT orders.employee_id, count(orders.order_id) AS order_count, sum(orders.amount) as order_amount FROM orders GROUP BY orders.employee_id;
+-------------+-------------+--------------+
| employee_id | order_count | order_amount |
+-------------+-------------+--------------+
| 2 | 2 | 2100 |
| 4 | 3 | 5300 |
| 7 | 3 | 4500 |
+-------------+-------------+--------------+ |
这个表可以在Tableau创建数据连接的时候,自定义SQL:
然后在做LEFT JOIN:
统计出来的月薪正确了,如图:
Tableau实际上就将查询结果建立一个临时表之后在LEFT JOIN:
mysql> SELECT employees.employee_id, employees.department, employees.name, employees.salary, order_total_counts.order_count, order_total_counts.order_amount from employees LEFT JOIN (SELECT orders.employee_id, count(orders.order_id) AS order_count, sum(orders.amount) AS order_amount FROM orders GROUP BY orders.employee_id) order_total_counts ON employees.employee_id=order_total_counts.employee_id;
+-------------+------------+-----------+--------+-------------+--------------+
| employee_id | department | name | salary | order_count | order_amount |
+-------------+------------+-----------+--------+-------------+--------------+
| 1 | 销售部1 | 李彦宏 | 3000 | NULL | NULL |
| 2 | 销售部1 | 周鸿祎 | 3500 | 2 | 2100 |
| 3 | 销售部2 | 雷军 | 2000 | NULL | NULL |
| 4 | 销售部2 | 贾跃亭 | 3100 | 3 | 5300 |
| 5 | 销售部3 | 董明珠 | 4200 | NULL | NULL |
| 6 | 销售部3 | 冯仑 | 4100 | NULL | NULL |
| 7 | 销售部3 | 马云 | 3900 | 3 | 4500 |
+-------------+------------+-----------+--------+-------------+--------------+ |
这样就可以完成按照部门来统计了:
转载于:https://my.oschina.net/caohong/blog/369104