Hive的简单总结

1.什么是HIVE

1.hive是一个工具,用于将输入的sql语句翻译成MapReduce程序,对HDFS中的数据进行增删改查和运算,并将结果存入HDFS中。

 

2.HIVE的工作机制

核心:

HIVE 利用HDFS来存储文件;

利用Mapreduce来做数据分析

利用mysql来为用户提供查询接口

3. HIVEDDL(数据库定义语言

database definition language)

1.建库

create  database  库名;

2.建表

建表之前必须先选定库;use 库名;

a.内部表

表建在HDFS/user/hive/warehouse/表名

create  table  表名(………) row  format  delimited  fields  terminated  by  ‘,’;

 

 

b.外部表

表建在了HDFS/xx/yy/目录下

create  external  table  表名(……) row  format  delimited  fields  terminated  by  ‘ ,’  location ‘/xx/yy/’;

 

 

注意:

内部表被drop的时候,表目录(就是以表名创建的文件夹)会被删除,表的元数据也会被删除。

外部表被drop的时候,表的元数据会被删除;但是表的(其实创建表的时候就没有以表名创建文件夹)目录还在!

3.修改表定义

修改表名:alter  table  原表名  rename  to  新表名

删除表:drop  table  表名

 

4.分区表

分区表的实质:就是在表目录下还可以建子目录,以便于在查询的时候可以指定具体的子目录进行查询,提高查询的效率

 内部表分区建表语法:

Create  table  表名(……) partitioned by() row  format  delimited  fields  terminated  by ‘,’;

 导入数据到指定的分区:

Load  data  local  inpath  ‘本地路径’  into  table  t_2  partition(day=’2018-04-15’,city=’beijing’)

4. hiveDML(数据库操作语言

database manipulation language )

 

sql运算模型一:逐行运算模型(逐行表达式,逐行过滤)

where用于逐行过滤

sql运算模式二:分组运算模型(分组表达式,分组过滤)

group by 用于将数据库分组

having 用于将分组过滤

sqljoin联表机制:

join的实质就是讲多个表的数据连城一个表,作为查询的输入数据集!

表 a

+---------+-------+--+

| a.name  | a.id  |

+---------+-------+--+

| a       | 1     |

| b       | 2     |

| c       | 3     |

| d       | 4     |

+---------+-------+--+

 

b

+---------+-------------+--+

| b.name  | b.nickname  |

+---------+-------------+--+

| a       | xx          |

| b       | yy          |

| d       | zz          |

| e       | pp          |

+---------+-------------+--+

 

 

1. 笛卡尔积连接 :表一, 表二

select * from a , b;

 

示意图:

+---------+-------+---------+-------------+--+

| a.name  | a.id  | b.name  | b.nickname  |

+---------+-------+---------+-------------+--+

| a       | 1     | a       | xx          |

| b       | 2     | a       | xx          |

| c       | 3     | a       | xx          |

| d       | 4     | a       | xx          |

| a       | 1     | b       | yy          |

| b       | 2     | b       | yy          |

| c       | 3     | b       | yy          |

| d       | 4     | b       | yy          |

| a       | 1     | d       | zz          |

| b       | 2     | d       | zz          |

| c       | 3     | d       | zz          |

| d       | 4     | d       | zz          |

| a       | 1     | e       | pp          |

| b       | 2     | e       | pp          |

| c       | 3     | e       | pp          |

| d       | 4     | e       | pp          |

+---------+-------+---------+-------------+--+

 

2.内连接:

select * from a join b on a.name=b.name;

示意图:

+---------+-------+---------+-------------+--+

| a.name  | a.id  | b.name  | b.nickname  |

+---------+-------+---------+-------------+--+

| a       | 1     | a       | xx          |

| b       | 2     | b       | yy          |

| d       | 4     | d       | zz          |

+---------+-------+---------+-------------+--+

 

3.左外连接:

select *

from

a left outer join b

on a.name=b.name;

示意图:

+---------+-------+---------+-------------+--+

| a.name  | a.id  | b.name  | b.nickname  |

+---------+-------+---------+-------------+--+

| a       | 1     | a       | xx          |

| b       | 2     | b       | yy          |

| c       | 3     | NULL    | NULL        |

| d       | 4     | d       | zz          |

+---------+-------+---------+-------------+--+

 

 

4. 右外连接:

select *

from

a right outer join b

on a.name=b.name;

 

示意图:

+---------+-------+---------+-------------+--+

| a.name  | a.id  | b.name  | b.nickname  |

+---------+-------+---------+-------------+--+

| a       | 1     | a       | xx          |

| b       | 2     | b       | yy          |

| d       | 4     | d       | zz          |

| NULL    | NULL  | e       | pp          |

+---------+-------+---------+-------------+--+

 

 

 

5. 全外连接:表一 full join 表二

 

select *

from

a full outer join b

on a.name=b.name;

 

示意图:

+---------+-------+---------+-------------+--+

| a.name  | a.id  | b.name  | b.nickname  |

+---------+-------+---------+-------------+--+

| a       | 1     | a       | xx          |

| b       | 2     | b       | yy          |

| c       | 3     | NULL    | NULL        |

| d       | 4     | d       | zz          |

| NULL    | NULL  | e       | pp          |

+---------+-------+---------+-------------+--+

 

 

6.左半连接:表一 left semi join 表二

左半连接没有右表的信息

select *

from

a left semi join b

on a.name=b.name;

 

示意图:

+---------+-------+--+

| a.name  | a.id  |

+---------+-------+--+

| a       | 1     |

| b       | 2     |

| d       | 4     |

+---------+-------+--+

 

注意: left semi joinselect子句中,不能有右表的字段

Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:15 Invalid table alias or column reference 'nickname': (possible column names are: name, id) (state=42000,code=10004)

 

错误:在编译语句时出错:FailedSemanticException[Error 10004]:第1行:15无效表别名或列引用‘'nickname'’:(可能的列名是:nameid)(状态=42000,代码=10004)

Map集合

建表:

Create table t_map(int id,name string, ifno map<string,string>)

Row format delimited fields terminated by ’,’

Collection items terminated by ’#’

Map keys terminated by ’:’;

## map字段的所有key

 

select id,name,map_keys(family) as fa,age

from t_map;

 

## map字段的所有value

 

select id,name,map_values(family) as mv,age

from t_map;

 

## 综合:查询有brother的用户信息

select id,name,family,age

from t_map

##where是逐行查询的判断筛选语句##

where array_contains(map_keys(family),'brother');

Array集合

Create table t_array (id int,name string,ifno array<string>)

Row format delimited fields terminated by ‘,’

Collection items terminated ‘:’;

select *

from

t_array

where array_contains(num,14);

 

 

select name,size(num) as lenth

from

t_array;

 

 

select name,num[0] as lenth

from

t_array;

Struct对象

Create  table  t_struct(id int,name string,bean struct<age:int,name:string,addr:string>)

Row format delimited fields terminated by ‘,’

Collection items terminated by ’:’;

查询

select id,name,ifno.age

from t_struct;

 

select id,name,ifno.sex,ifno.addr

from t_struct;

5.HIVE的函数表达式

1.数学运算函数

select round(5.4);   ## 5  四舍五入

select round(5.1345,3) ;  ##5.135

select ceil(5.4) ; // select ceiling(5.4) ;   ## 6  向上取整

select floor(5.4);  ## 5  向下取整

select abs(-5.4) ;  ## 5.4  绝对值

对行取最大值:

select greatest(3,5,6) ;  ## 6  

对行取最小值:

select least(3,5,6) ;  ##求多个输入参数中的最小值

对组求最大值:

select max(age) from t_person group by ..;

对组求最小值:

select min(age) from t_person group by...;

2.字符串函数

substr(string str, int start)   ## 截取子串

substring(string str, int start)

示例:select substr("abcdefg",2) ;

substr(string, int start, int len)

substring(string, int start, int len)

示例:select substr("abcdefg",2,3) ;  ## bcd

 

concat(string A, string B...)  ## 拼接字符串

concat_ws(string SEP, string A, string B...)

示例:select concat("ab","xy") ;  ## abxy

select concat_ws('.','192','168','137');

 

+--------------+--+

|     _c0      |

+--------------+--+

| 192.168.137  |

+--------------+--+

 

length(string A)

示例:select length("192.168.33.44");  ## 13

 

split(string str, string pat)  ## 切分字符串

示例:select split("192.168.33.44",".") ; 错误的,因为.号是正则语法中的特定字符

select split("192.168.33.44","\\.") ;

 

 

 

upper(string str)  ##转大写

lower(string str)  ##转小写

3.时间函数

select current_timestamp;

 ## 返回值类型:timestamp,获取当前的时间戳(详细时间信息)

 

select current_date;  

 ## 返回值类型:date,获取当前的日期

 

 

 

 

## unix时间戳转字符串格式——from_unixtime

from_unixtime(bigint unixtime[, string format])

示例:select from_unixtime(unix_timestamp());

select from_unixtime(unix_timestamp(),"yyyy/MM/dd HH:mm:ss");

 

## 字符串格式转unix时间戳——unix_timestamp:返回值是一个长整数类型

 

## 如果不带参数,取当前时间的秒数时间戳long--(距离格林威治时间1970-1-1 0:0:0秒的差距)

select unix_timestamp();

 

unix_timestamp(string date, string pattern)

示例: select unix_timestamp("2017-08-10 17:50:30");

select unix_timestamp("2017-08-10 17:50:30","yyyy-MM-dd HH:mm:ss");

 

## 将字符串转成日期date

select to_date("2017-09-17 16:58:32");

4.if语句

 

原图:

+---------------------+--------------------------------+-----------------+--+

| t_movie.movie_name    |        t_movie.starring          | t_movie.m_date  |

+---------------------+--------------------------------+-----------------+--+

| 战狼2             | ["吴京","吴刚","卢靖姗"]             | 2017-08-16      |

| 三生三世十里桃花     | ["刘亦菲","杨洋"]                 | 2017-08-20      |

| 羞羞的铁拳    | ["沈腾","玛丽","艾伦"]                    | 2017-12-20      |

| 人民的名义    | ["陆毅","张丰毅","吴刚","许亚军","张志坚"]  | 2017-06-30      |

+---------------------+--------------------------------+-----------------+--+

 select movie_name,if(array_contains(starring,'吴刚'),'好片','烂片'),m_date

 From  t_movie;

 

 

示意图:

+-------------+------+-------------+--+

| movie_name  | _c1  |   m_date    |

+-------------+------+-------------+--+

| 战狼2         | 好片   | 2017-08-16  |

| 三生三世十里桃花    | 烂片   | 2017-08-20  |

| 羞羞的铁拳       | 烂片   | 2017-12-20  |

| 人民的名义       | 好片   | 2017-06-30  |

+-------------+------+-------------+--+

5.case when语句

select movie_name,m_date,

 case

 when

 m_date>'2017-08-08' then'新片'

 when m_date<='2017-08-08' then '老片'

 end as movie

 from t_movie;

 

 

原表:

+---------------------+--------------------------------+-----------------+--+

| t_movie.movie_name    |        t_movie.starring          | t_movie.m_date  |

+---------------------+--------------------------------+-----------------+--+

| 战狼2             | ["吴京","吴刚","卢靖姗"]             | 2017-08-16      |

| 三生三世十里桃花     | ["刘亦菲","杨洋"]                 | 2017-08-20      |

| 羞羞的铁拳    | ["沈腾","玛丽","艾伦"]                    | 2017-12-20      |

| 人民的名义    | ["陆毅","张丰毅","吴刚","许亚军","张志坚"]  | 2017-06-30      |

+---------------------+--------------------------------+-----------------+--+

示意表:

+-------------+-------------+--------+--+

| movie_name  |   m_date    | movie  |

+-------------+-------------+--------+--+

| 战狼2         | 2017-08-16  | 新片     |

| 三生三世十里桃花    | 2017-08-20  | 新片     |

| 羞羞的铁拳       | 2017-12-20  | 新片     |

| 人民的名义       | 2017-06-30  | 老片     |

+-------------+-------------+--------+--+

6.常见分组聚合函数

 

sum(字段)  :  求这个字段在一个组中的所有值的和

avg(字段)  : 求这个字段在一个组中的所有值的平均值

max(字段)  :求这个字段在一个组中的所有值的最大值

min(字段)  :求这个字段在一个组中的所有值的最小值

 

7.表生成函数

1.explode()

使用explode()对数组字段“炸裂”

select explode(starring) as actors

 from t_movie;

+--------+--+

| movie  |

+--------+--+

| 吴京     |

| 吴刚     |

| 卢靖姗    |

| 刘亦菲    |

| 杨洋     |

| 沈腾     |

| 玛丽     |

| 艾伦     |

| 陆毅     |

| 张丰毅    |

| 吴刚     |

| 许亚军    |

| 张志坚    |

+--------+--+

2. lateral  view()

Select movie_name,tmp.actors,m_date

From t_movie lateral view explode(starring) tmp as actors;

 

 

示例表:

+-------------+-------------+-------------+--+

| movie_name  | tmp.actors  |   m_date    |

+-------------+-------------+-------------+--+

| 战狼2         | 吴京          | 2017-08-16  |

| 战狼2         | 吴刚          | 2017-08-16  |

| 战狼2         | 卢靖姗         | 2017-08-16  |

| 三生三世十里桃花    | 刘亦菲         | 2017-08-20  |

| 三生三世十里桃花    | 杨洋          | 2017-08-20  |

| 羞羞的铁拳       | 沈腾          | 2017-12-20  |

| 羞羞的铁拳       | 玛丽          | 2017-12-20  |

| 羞羞的铁拳       | 艾伦          | 2017-12-20  |

| 人民的名义       | 陆毅          | 2017-06-30  |

| 人民的名义       | 张丰毅         | 2017-06-30  |

| 人民的名义       | 吴刚          | 2017-06-30  |

| 人民的名义       | 许亚军         | 2017-06-30  |

| 人民的名义       | 张志坚         | 2017-06-30  |

+-------------+-------------+-------------+--+

理解: lateral view 相当于两个表在join

左表:是原表

右表:是explode(某个集合字段)之后产生的表

而且:这个join只在同一行的数据间进行

2. 将分散表按照name分组聚合成一行

select

name, concat_ws('_',collect_set(project))  as project

from project_promotion

group by name;

 

 

 

+-------------------------+----------------------------+--+

| project_promotion.name  | project_promotion.project  |

+-------------------------+----------------------------+--+

| 小明                      | 语文                         |

| 小明                      | 数学                         |

| 小明                      | 英语                         |

| 小明                      | 物理                         |

| 小明                      | 化学                         |

| 小红                      | 生物                         |

| 小红                      | 历史                         |

| 小红                      | 生化                         |

| 小红                      | 政治                         |

+-------------------------+----------------------------+--+

结果表

+-------+-----------------+--+

| name  |     project     |

+-------+-----------------+--+

| 小明    | 语文_数学_英语_物理_化学  |

| 小红    | 生物_历史_生化_政治     |

+-------+-----------------+--+

 

3. json解析函数:表生成函数

1.Json解析到本地:

 

Insert overwrite local directory ‘/root/hivedata/’

Row fromat  delimited  fields terminated by ’\t’

Select

Json_tuple(json,"movie","rate","timeStamp","uid") as(movie,rate,ts,uid1)

From t_rate;

 

 

 

3. json解析到HDFS

 

Insert overwrite directory ‘/root/hivedata/’

Select

Json_tuple(json,"movie","rate","timeStamp","uid") as(movie,rate,ts,uid1)

From t_rate;

 

 

+----------------------------------------------------------+--+

|                       t_json.json                        |

+----------------------------------------------------------+--+

| {"id":1,"name":"zhangsan1","age":18,"city":"beijing"}    |

| {"id":2,"name":"zhangsan2","age":28,"city":"shanghai"}   |

| {"id":3,"name":"zhangsan3","age":38,"city":"shanghai"}   |

| {"id":4,"name":"zhangsan4","age":17,"city":"beijing"}    |

| {"id":5,"name":"zhangsan5","age":19,"city":"beijing"}    |

| {"id":6,"name":"zhangsan6","age":26,"city":"guangzhou"}  |

| {"id":7,"name":"zhangsan7","age":17,"city":"beijing"}    |

| {"id":8,"name":"zhangsan8","age":14,"city":"guangzhou"}  |

| {"id":9,"name":"zhangsan9","age":48,"city":"beijing"}    |

+----------------------------------------------------------+--+

需要做各种统计分析:

发现,直接对jsonsql查询不方便,需要将json数据解析成普通的结构化数据表。可以采用hive中内置的json_tuple()函数

实现步骤:

1、创建一个原始表用来对应原始的json数据

create table t_json(json string);

load data local inpath ‘/root/rating.json’ into table t_json;

Select

Json_tuple(“id”,”name”,”age”,”city”) as (id,name,age,city)

From t_json;

 

 

+-----+------------+------+------------+--+

| id  |    name    | age  |    city    |

+-----+------------+------+------------+--+

| 1   | zhangsan1  | 18   | beijing    |

| 2   | zhangsan2  | 28   | shanghai   |

| 3   | zhangsan3  | 38   | shanghai   |

| 4   | zhangsan4  | 17   | beijing    |

| 5   | zhangsan5  | 19   | beijing    |

| 6   | zhangsan6  | 26   | guangzhou  |

| 7   | zhangsan7  | 17   | beijing    |

| 8   | zhangsan8  | 14   | guangzhou  |

| 9   | zhangsan9  | 48   | beijing    |

+-----+------------+------+------------+--+

8. 窗口分析函数

1.row_number() over()  TOPN

原表:

+-----+------+-------+-------+-------+--+

| id  | age  | name  | addr  | rown  |

+-----+------+-------+-------+-------+--+

| 4   | 16   | d     | 上海    | 1     |

| 4   | 16   | d     | 上海    | 2     |

| 3   | 22   | c     | 上海    | 3     |

| 3   | 22   | c     | 上海    | 4     |

| 9   | 26   | f     | 上海    | 5     |

| 9   | 26   | f     | 上海    | 6     |

| 6   | 26   | f     | 上海    | 7     |

| 6   | 26   | f     | 上海    | 8     |

| 10  | 14   | b     | 北京    | 1     |

| 10  | 14   | b     | 北京    | 2     |

| 1   | 18   | a     | 北京    | 3     |

| 1   | 18   | a     | 北京    | 4     |

| 2   | 19   | b     | 北京    | 5     |

| 2   | 19   | b     | 北京    | 6     |

| 7   | 16   | d     | 深圳    | 1     |

| 7   | 16   | d     | 深圳    | 2     |

| 8   | 30   | e     | 深圳    | 3     |

| 5   | 30   | e     | 深圳    | 4     |

| 5   | 30   | e     | 深圳    | 5     |

| 8   | 30   | e     | 深圳    | 6     |

+-----+------+-------+-------+-------+--+

 

//查询每个地区年龄最大的两条

Select id,age,name,addr

from

(Select

Id,age,name,addr,row_number()over(partition by addr order by age desc) as rown

From t_rown)o1

Where o1.rown<=2;

 

 

示例表:

+-----+------+-------+-------+--+

| id  | age  | name  | addr  |

+-----+------+-------+-------+--+

| 6   | 26   | f     | 上海    |

| 9   | 26   | f     | 上海    |

| 2   | 19   | b     | 北京    |

| 2   | 19   | b     | 北京    |

| 8   | 30   | e     | 深圳    |

| 5   | 30   | e     | 深圳    |

+-----+------+-------+-------+--+

2.sum() over()——级联求和

 

重要提示

sum()over()的累加范围指定语法:

 

sum() over(partition by x order by y rows between 8 preceding and current row)

sum() over(partition by x order by y rows between 8 preceding and 5 following)

sum() over(partition by x order by y rows between unbounded preceding and 5 following)

sum() over(partition by x order by y rows between unbounded preceding and unbounded following)

 

 

 

 

 

select name,month,amt,

sum(amt) over(partition by name order by month rows between unbounded preceding and current row) as accumulate

 

 

from t_tmp;

 

 

 

 

 

 

hive 窗口分析函数

 

select * from t_access;

 

+----------------+---------------------------------+-----------------------+--------------+--+

|  t_access.ip   |          t_access.url           | t_access.access_time  | t_access.dt  |

+----------------+---------------------------------+-----------------------+--------------+--+

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20   | 20170804     |

| 192.168.33.3   | http://www.edu360.cn/teach      | 2017-08-04 15:35:20   | 20170804     |

| 192.168.33.4   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20   | 20170804     |

| 192.168.33.4   | http://www.edu360.cn/job        | 2017-08-04 16:30:20   | 20170804     |

| 192.168.33.5   | http://www.edu360.cn/job        | 2017-08-04 15:40:20   | 20170804     |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-05 15:30:20   | 20170805     |

| 192.168.44.3   | http://www.edu360.cn/teach      | 2017-08-05 15:35:20   | 20170805     |

| 192.168.33.44  | http://www.edu360.cn/stu        | 2017-08-05 15:30:20   | 20170805     |

| 192.168.33.46  | http://www.edu360.cn/job        | 2017-08-05 16:30:20   | 20170805     |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-05 15:40:20   | 20170805     |

| 192.168.133.3  | http://www.edu360.cn/register   | 2017-08-06 15:30:20   | 20170806     |

| 192.168.111.3  | http://www.edu360.cn/register   | 2017-08-06 15:35:20   | 20170806     |

| 192.168.34.44  | http://www.edu360.cn/pay        | 2017-08-06 15:30:20   | 20170806     |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20   | 20170806     |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20   | 20170806     |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20   | 20170806     |

| 192.168.33.25  | http://www.edu360.cn/job        | 2017-08-06 15:40:20   | 20170806     |

| 192.168.33.36  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20   | 20170806     |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20   | 20170806     |

+----------------+---------------------------------+-----------------------+--------------+--+

 

## LAG() over() 函数

求每个人的每次访问记录后加上他的上次访问时间

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

lag(access_time,1,0) over(partition by ip order by access_time)as last_access_time

from t_access;

 

+----------------+---------------------------------+----------------------+-----+----------------------+--+

|       ip       |               url               |     access_time      | rn  |   last_access_time   |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

| 192.168.111.3  | http://www.edu360.cn/register   | 2017-08-06 15:35:20  | 1   | 0                    |

| 192.168.133.3  | http://www.edu360.cn/register   | 2017-08-06 15:30:20  | 1   | 0                    |

| 192.168.33.25  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 1   | 0                    |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | 0                    |

| 192.168.33.3   | http://www.edu360.cn/teach      | 2017-08-04 15:35:20  | 2   | 2017-08-04 15:30:20  |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 3   | 2017-08-04 15:35:20  |

| 192.168.33.36  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 1   | 0                    |

| 192.168.33.4   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | 0                    |

| 192.168.33.4   | http://www.edu360.cn/job        | 2017-08-04 16:30:20  | 2   | 2017-08-04 15:30:20  |

| 192.168.33.44  | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 1   | 0                    |

| 192.168.33.46  | http://www.edu360.cn/job        | 2017-08-05 16:30:20  | 1   | 0                    |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 2   | 2017-08-05 16:30:20  |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 3   | 2017-08-06 16:30:20  |

| 192.168.33.5   | http://www.edu360.cn/job        | 2017-08-04 15:40:20  | 1   | 0                    |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-05 15:40:20  | 1   | 0                    |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 2   | 2017-08-05 15:40:20  |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 3   | 2017-08-06 15:40:20  |

| 192.168.34.44  | http://www.edu360.cn/pay        | 2017-08-06 15:30:20  | 1   | 0                    |

| 192.168.44.3   | http://www.edu360.cn/teach      | 2017-08-05 15:35:20  | 1   | 0                    |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

 

 

## LEAD() over() 函数

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

lead(access_time,1,0) over(partition by ip order by access_time)as pre_access_time

from t_access;

+----------------+---------------------------------+----------------------+-----+----------------------+--+

|       ip       |               url               |     access_time      | rn  |   pre_access_time   |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

| 192.168.111.3  | http://www.edu360.cn/register   | 2017-08-06 15:35:20  | 1   | 0                    |

| 192.168.133.3  | http://www.edu360.cn/register   | 2017-08-06 15:30:20  | 1   | 0                    |

| 192.168.33.25  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 1   | 0                    |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | 2017-08-04 15:35:20  |

| 192.168.33.3   | http://www.edu360.cn/teach      | 2017-08-04 15:35:20  | 2   | 2017-08-05 15:30:20  |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 3   | 0                    |

| 192.168.33.36  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 1   | 0                    |

| 192.168.33.4   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | 2017-08-04 16:30:20  |

| 192.168.33.4   | http://www.edu360.cn/job        | 2017-08-04 16:30:20  | 2   | 0                    |

| 192.168.33.44  | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 1   | 0                    |

| 192.168.33.46  | http://www.edu360.cn/job        | 2017-08-05 16:30:20  | 1   | 2017-08-06 16:30:20  |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 2   | 2017-08-06 16:30:20  |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 3   | 0                    |

| 192.168.33.5   | http://www.edu360.cn/job        | 2017-08-04 15:40:20  | 1   | 0                    |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-05 15:40:20  | 1   | 2017-08-06 15:40:20  |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 2   | 2017-08-06 15:40:20  |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 3   | 0                    |

| 192.168.34.44  | http://www.edu360.cn/pay        | 2017-08-06 15:30:20  | 1   | 0                    |

| 192.168.44.3   | http://www.edu360.cn/teach      | 2017-08-05 15:35:20  | 1   | 0                    |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

 

 

## FIRST_VALUE() over() 函数

例:取每个用户访问的第一个页面

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

first_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time

from t_access;

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

|       ip       |               url               |     access_time      | rn  |        last_access_time         |

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

| 192.168.111.3  | http://www.edu360.cn/register   | 2017-08-06 15:35:20  | 1   | http://www.edu360.cn/register   |

| 192.168.133.3  | http://www.edu360.cn/register   | 2017-08-06 15:30:20  | 1   | http://www.edu360.cn/register   |

| 192.168.33.25  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | http://www.edu360.cn/stu        |

| 192.168.33.3   | http://www.edu360.cn/teach      | 2017-08-04 15:35:20  | 2   | http://www.edu360.cn/stu        |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 3   | http://www.edu360.cn/stu        |

| 192.168.33.36  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 1   | http://www.edu360.cn/excersize  |

| 192.168.33.4   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | http://www.edu360.cn/stu        |

| 192.168.33.4   | http://www.edu360.cn/job        | 2017-08-04 16:30:20  | 2   | http://www.edu360.cn/stu        |

| 192.168.33.44  | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 1   | http://www.edu360.cn/stu        |

| 192.168.33.46  | http://www.edu360.cn/job        | 2017-08-05 16:30:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 2   | http://www.edu360.cn/job        |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 3   | http://www.edu360.cn/job        |

| 192.168.33.5   | http://www.edu360.cn/job        | 2017-08-04 15:40:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-05 15:40:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 2   | http://www.edu360.cn/job        |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 3   | http://www.edu360.cn/job        |

| 192.168.34.44  | http://www.edu360.cn/pay        | 2017-08-06 15:30:20  | 1   | http://www.edu360.cn/pay        |

| 192.168.44.3   | http://www.edu360.cn/teach      | 2017-08-05 15:35:20  | 1   | http://www.edu360.cn/teach      |

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

 

## LAST_VALUE() over() 函数

例:取每个用户访问的最后一个页面

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

last_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time

from t_access;

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

|       ip       |               url               |     access_time      | rn  |        last_access_url         |

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

| 192.168.111.3  | http://www.edu360.cn/register   | 2017-08-06 15:35:20  | 1   | http://www.edu360.cn/register   |

| 192.168.133.3  | http://www.edu360.cn/register   | 2017-08-06 15:30:20  | 1   | http://www.edu360.cn/register   |

| 192.168.33.25  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | http://www.edu360.cn/stu        |

| 192.168.33.3   | http://www.edu360.cn/teach      | 2017-08-04 15:35:20  | 2   | http://www.edu360.cn/stu        |

| 192.168.33.3   | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 3   | http://www.edu360.cn/stu        |

| 192.168.33.36  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 1   | http://www.edu360.cn/excersize  |

| 192.168.33.4   | http://www.edu360.cn/stu        | 2017-08-04 15:30:20  | 1   | http://www.edu360.cn/stu        |

| 192.168.33.4   | http://www.edu360.cn/job        | 2017-08-04 16:30:20  | 2   | http://www.edu360.cn/stu        |

| 192.168.33.44  | http://www.edu360.cn/stu        | 2017-08-05 15:30:20  | 1   | http://www.edu360.cn/stu        |

| 192.168.33.46  | http://www.edu360.cn/job        | 2017-08-05 16:30:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 2   | http://www.edu360.cn/job        |

| 192.168.33.46  | http://www.edu360.cn/excersize  | 2017-08-06 16:30:20  | 3   | http://www.edu360.cn/job        |

| 192.168.33.5   | http://www.edu360.cn/job        | 2017-08-04 15:40:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-05 15:40:20  | 1   | http://www.edu360.cn/job        |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 2   | http://www.edu360.cn/job        |

| 192.168.33.55  | http://www.edu360.cn/job        | 2017-08-06 15:40:20  | 3   | http://www.edu360.cn/job        |

| 192.168.34.44  | http://www.edu360.cn/pay        | 2017-08-06 15:30:20  | 1   | http://www.edu360.cn/pay        |

| 192.168.44.3   | http://www.edu360.cn/teach      | 2017-08-05 15:35:20  | 1   | http://www.edu360.cn/teach      |

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

 

 

/*

累计报表--分析函数实现版

*/

 

 先求出每个人每个月的总金额

select id,month,

sum(fee) as amount

from t_test

group by id,month;

 +-----+----------+-------+--+

| id  |  month   | amount|

+-----+----------+-------+--+

| A   | 2015-01  | 33    |

| A   | 2015-02  | 10    |

| A   | 2015-03  | 20    |

| B   | 2015-01  | 30    |

| B   | 2015-02  | 15    |

| B   | 2015-03  | 45    |

| C   | 2015-01  | 30    |

| C   | 2015-02  | 40    |

| C   | 2015-03  | 30    |

+-----+----------+-------+--+

 

 

-- sum() over() 函数

select id

,month

,amount

,sum(amount) over(partition by id order by month rows between unbounded preceding and current row)

from

tmp;

 

 

 

 

 

 

select id,month,sum(cnt) as cnts

 

from t_access_time

group by id,month;

 

tmp

+-----+----------+-------+--+

| id  |  month   | cnts  |

+-----+----------+-------+--+

| A   | 2015-01  | 33    |   cxx

| A   | 2015-02  | 10    |   xx

| A   | 2015-03  | 20    |

| B   | 2015-01  | 30    |

| B   | 2015-02  | 15    |

| B   | 2015-03  | 45    |

| C   | 2015-01  | 30    |

| C   | 2015-02  | 40    |

| C   | 2015-03  | 30    |

+-----+----------+-------+--+

 

select id,month,cnts,

sum(cnts) over(partition by id order by month rows between unbounded preceding and current row ) as accumulate_cnts

 

from (select id,month,sum(cnt) as cnts

 

from t_access_time

group by id,month) tmp;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值