3.查询某个库中的表 方法一:SELECT * FROM db_name.table_name; 方法二:USE db_name; SELECT * FROM table_name;
4.WHERE条件 SELECT * FROM sales WHERE amount > 10 AND region = 'CHINA'
5.基于分区的查询
SELECT page_views.* FROM page_views WHERE page_views.date >= '2018-01-01' AND page_views.date <= '2018-01-31';
如果一个表和另一个表join,那么分区条件可以放在ON中
SELECT page_views.* FROM page_views JOIN dim_users ON (page_views.user_id = dim_users.id AND page_views.date >= '2018-01-01' AND page_views.date <= '2018-01-31');
6.having子句
SELECT col1 FROM t1 GROUP BY col1 HAVING SUM(col2) > 10;
也可以这样写
SELECT col1 FROM (SELECT col1, SUM(col2) as col2sum FROM t1 GROUP BY col1) t2 WHERE t2.col2sum > 10;
7.LIMIT子句 返回前5行
SELECT * FROM customers LIMIT 5;
返回第三行到第七行
SELECT * FROM customers ORDER BY create_date LIMIT 2,5;
二.排序
1.全局排序(order by ) Order By: 全局排序,只有一个 reducer
1.1使用 order by 子句排序
asc(ascend): 升序(默认) desc(descend): 降序
1.2.order by 子句在 select 语句的结尾
2.分组(Grouping)
GROUP BY子句与聚合函数配合使用,将数据集按照指定列进行分组,然后对每个组执行聚合操作。
-- 按部门分组并计算每个部门的平均工资 SELECT department, AVG(salary) as avg_salary FROM employees GROUP BY department;
3.基础聚合 max min count avg sum —— MMCAS COUNT(NULL)=0 SUM(NULL)=NULL COUNT(FIELD | * | 1) : 计算所有数据行,不管是否为NULL – 集合类型聚合 collect_set (F|func|case…when) – 去重后的列表 collect_list (F|func|case…when) – 列表 格式为[“11599”,“265”,“5462”]
4.高级聚合
GROUPING SETS和GROUPING__ID GROUPING SETS子句允许开发者自行组合GROUP BY子句中出现的字段作为分组字段,其实现效果等同于按照不同字段分组的SQL语句进行UNION操作。下面这段SQL将生成按照product_id,channel_id和channel_id,promotion_id分组汇总sale_amount的数据: SELECT a.product_id , a.channel_id , a.promotion_id , SUM(a.sale_amount) AS sale_amount FROM dwd.dwd_sales a GROUP BY a.product_id , a.channel_id , a.promotion_id GROUPING SETS ((a.product_id, a.channel_id),(a.channel_id, a.promotion_id));