Hive的简单总结

最新推荐文章于 2024-08-13 16:48:46 发布

Amos_Mu

最新推荐文章于 2024-08-13 16:48:46 发布

阅读量779

点赞数

文章标签： Hive几个简单的函数

本文链接：https://blog.csdn.net/mys_35088/article/details/80817740

版权

1.什么是HIVE

1.hive是一个工具，用于将输入的sql语句翻译成MapReduce程序，对HDFS中的数据进行增删改查和运算，并将结果存入HDFS中。

2.HIVE的工作机制

核心：

HIVE 利用HDFS来存储文件；

利用Mapreduce来做数据分析

利用mysql来为用户提供查询接口

3. HIVE的DDL(数据库定义语言

database definition language)

1.建库

create database 库名;

2.建表

建表之前必须先选定库；use 库名；

a.内部表

表建在HDFS的 /user/hive/warehouse/表名

create table 表名(………) row format delimited fields terminated by ‘,’;

b.外部表

表建在了HDFS的/xx/yy/目录下

create external table 表名(……) row format delimited fields terminated by ‘ ,’ location ‘/xx/yy/’;

注意：

内部表被drop的时候，表目录（就是以表名创建的文件夹）会被删除，表的元数据也会被删除。

外部表被drop的时候，表的元数据会被删除；但是表的（其实创建表的时候就没有以表名创建文件夹）目录还在！

3.修改表定义

修改表名：alter table 原表名 rename to 新表名

删除表：drop table 表名

4.分区表

分区表的实质：就是在表目录下还可以建子目录，以便于在查询的时候可以指定具体的子目录进行查询，提高查询的效率

内部表分区建表语法：

Create table 表名(……) partitioned by() row format delimited fields terminated by ‘,’;

导入数据到指定的分区：

Load data local inpath ‘本地路径’ into table t_2 partition(day=’2018-04-15’,city=’beijing’)

4. hive的DML(数据库操作语言

database manipulation language )

sql运算模型一：逐行运算模型（逐行表达式，逐行过滤）

where用于逐行过滤

sql运算模式二：分组运算模型（分组表达式，分组过滤）

group by 用于将数据库分组

having 用于将分组过滤

sql的join联表机制：

join的实质就是讲多个表的数据连城一个表，作为查询的输入数据集！

表 a：

+---------+-------+--+

| a.name | a.id |

+---------+-------+--+

| a | 1 |

| b | 2 |

| c | 3 |

| d | 4 |

+---------+-------+--+

表 b：

+---------+-------------+--+

| b.name | b.nickname |

+---------+-------------+--+

| a | xx |

| b | yy |

| d | zz |

| e | pp |

+---------+-------------+--+

1. 笛卡尔积连接：表一，表二

select * from a , b;

示意图：

+---------+-------+---------+-------------+--+

+---------+-------+---------+-------------+--+

| a | 1 | a | xx |

| b | 2 | a | xx |

| c | 3 | a | xx |

| d | 4 | a | xx |

| a | 1 | b | yy |

| b | 2 | b | yy |

| c | 3 | b | yy |

| d | 4 | b | yy |

| a | 1 | d | zz |

| b | 2 | d | zz |

| c | 3 | d | zz |

| d | 4 | d | zz |

| a | 1 | e | pp |

| b | 2 | e | pp |

| c | 3 | e | pp |

| d | 4 | e | pp |

+---------+-------+---------+-------------+--+

2.内连接：

select * from a join b on a.name=b.name;

示意图：

+---------+-------+---------+-------------+--+

+---------+-------+---------+-------------+--+

| a | 1 | a | xx |

| b | 2 | b | yy |

| d | 4 | d | zz |

+---------+-------+---------+-------------+--+

3.左外连接：

select *

from

a left outer join b

on a.name=b.name;

示意图：

+---------+-------+---------+-------------+--+

+---------+-------+---------+-------------+--+

| a | 1 | a | xx |

| b | 2 | b | yy |

| c | 3 | NULL | NULL |

| d | 4 | d | zz |

+---------+-------+---------+-------------+--+

4. 右外连接：

select *

from

a right outer join b

on a.name=b.name;

示意图：

+---------+-------+---------+-------------+--+

+---------+-------+---------+-------------+--+

| a | 1 | a | xx |

| b | 2 | b | yy |

| d | 4 | d | zz |

| NULL | NULL | e | pp |

+---------+-------+---------+-------------+--+

5. 全外连接：表一 full join 表二

select *

from

a full outer join b

on a.name=b.name;

示意图：

+---------+-------+---------+-------------+--+

+---------+-------+---------+-------------+--+

| a | 1 | a | xx |

| b | 2 | b | yy |

| c | 3 | NULL | NULL |

| d | 4 | d | zz |

| NULL | NULL | e | pp |

+---------+-------+---------+-------------+--+

6.左半连接：表一 left semi join 表二

左半连接没有右表的信息

select *

from

a left semi join b

on a.name=b.name;

示意图：

+---------+-------+--+

| a.name | a.id |

+---------+-------+--+

| a | 1 |

| b | 2 |

| d | 4 |

+---------+-------+--+

注意： left semi join的 select子句中，不能有右表的字段

Error: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:15 Invalid table alias or column reference 'nickname': (possible column names are: name, id) (state=42000,code=10004)

错误：在编译语句时出错：Failed：SemanticException[Error 10004]：第1行：15无效表别名或列引用‘'nickname'’：(可能的列名是：name，id)(状态=42000，代码=10004)

Map集合

建表：

Create table t_map(int id,name string, ifno map<string,string>)

Row format delimited fields terminated by ’,’

Collection items terminated by ’#’

Map keys terminated by ’:’;

## 取map字段的所有key

select id,name,map_keys(family) as fa,age

from t_map;

## 取map字段的所有value

select id,name,map_values(family) as mv,age

from t_map;

## 综合：查询有brother的用户信息

select id,name,family,age

from t_map

##where是逐行查询的判断筛选语句##

where array_contains(map_keys(family),'brother');

Array集合

Create table t_array (id int,name string,ifno array<string>)

Row format delimited fields terminated by ‘,’

Collection items terminated ‘:’;

select *

from

t_array

where array_contains(num,14);

select name,size(num) as lenth

from

t_array;

select name,num[0] as lenth

from

t_array;

Struct对象

Create table t_struct(id int,name string,bean struct<age:int,name:string,addr:string>)

Row format delimited fields terminated by ‘,’

Collection items terminated by ’:’;

查询

select id,name,ifno.age

from t_struct;

select id,name,ifno.sex,ifno.addr

from t_struct;

5.HIVE的函数表达式

1.数学运算函数

select round(5.4); ## 5 四舍五入

select round(5.1345,3) ; ##5.135

select ceil(5.4) ; // select ceiling(5.4) ; ## 6 向上取整

select floor(5.4); ## 5 向下取整

select abs(-5.4) ; ## 5.4 绝对值

对行取最大值：

select greatest(3,5,6) ; ## 6

对行取最小值：

select least(3,5,6) ; ##求多个输入参数中的最小值

对组求最大值：

select max(age) from t_person group by ..;

对组求最小值：

select min(age) from t_person group by...;

2.字符串函数

substr(string str, int start) ## 截取子串

substring(string str, int start)

示例：select substr("abcdefg",2) ;

substr(string, int start, int len)

substring(string, int start, int len)

示例：select substr("abcdefg",2,3) ; ## bcd

concat(string A, string B...) ## 拼接字符串

concat_ws(string SEP, string A, string B...)

示例：select concat("ab","xy") ; ## abxy

select concat_ws('.','192','168','137');

+--------------+--+

| _c0 |

+--------------+--+

| 192.168.137 |

+--------------+--+

length(string A)

示例：select length("192.168.33.44"); ## 13

split(string str, string pat) ## 切分字符串

示例：~~select split("192.168.33.44",".") ;~~ 错误的，因为.号是正则语法中的特定字符

select split("192.168.33.44","\\.") ;

upper(string str) ##转大写

lower(string str) ##转小写

3.时间函数

select current_timestamp;

## 返回值类型：timestamp，获取当前的时间戳(详细时间信息)

select current_date;

## 返回值类型：date，获取当前的日期

## unix时间戳转字符串格式——from_unixtime

from_unixtime(bigint unixtime[, string format])

示例：select from_unixtime(unix_timestamp());

select from_unixtime(unix_timestamp(),"yyyy/MM/dd HH:mm:ss");

## 字符串格式转unix时间戳——unix_timestamp：返回值是一个长整数类型

## 如果不带参数，取当前时间的秒数时间戳long--(距离格林威治时间1970-1-1 0:0:0秒的差距)

select unix_timestamp();

unix_timestamp(string date, string pattern)

示例： select unix_timestamp("2017-08-10 17:50:30");

select unix_timestamp("2017-08-10 17:50:30","yyyy-MM-dd HH:mm:ss");

## 将字符串转成日期date

select to_date("2017-09-17 16:58:32");

4.if语句

原图：

+---------------------+--------------------------------+-----------------+--+

| t_movie.movie_name | t_movie.starring | t_movie.m_date |

+---------------------+--------------------------------+-----------------+--+

| 战狼2 | ["吴京","吴刚","卢靖姗"] | 2017-08-16 |

| 三生三世十里桃花 | ["刘亦菲","杨洋"] | 2017-08-20 |

| 羞羞的铁拳 | ["沈腾","玛丽","艾伦"] | 2017-12-20 |

| 人民的名义 | ["陆毅","张丰毅","吴刚","许亚军","张志坚"] | 2017-06-30 |

+---------------------+--------------------------------+-----------------+--+

select movie_name,if(array_contains(starring,'吴刚'),'好片','烂片'),m_date

From t_movie;

示意图：

+-------------+------+-------------+--+

| movie_name | _c1 | m_date |

+-------------+------+-------------+--+

| 战狼2 | 好片 | 2017-08-16 |

| 三生三世十里桃花 | 烂片 | 2017-08-20 |

| 羞羞的铁拳 | 烂片 | 2017-12-20 |

| 人民的名义 | 好片 | 2017-06-30 |

+-------------+------+-------------+--+

5.case when语句

select movie_name,m_date,

case

when

m_date>'2017-08-08' then'新片'

when m_date<='2017-08-08' then '老片'

end as movie

from t_movie;

原表：

+---------------------+--------------------------------+-----------------+--+

| t_movie.movie_name | t_movie.starring | t_movie.m_date |

+---------------------+--------------------------------+-----------------+--+

| 战狼2 | ["吴京","吴刚","卢靖姗"] | 2017-08-16 |

| 三生三世十里桃花 | ["刘亦菲","杨洋"] | 2017-08-20 |

| 羞羞的铁拳 | ["沈腾","玛丽","艾伦"] | 2017-12-20 |

| 人民的名义 | ["陆毅","张丰毅","吴刚","许亚军","张志坚"] | 2017-06-30 |

+---------------------+--------------------------------+-----------------+--+

示意表：

+-------------+-------------+--------+--+

| movie_name | m_date | movie |

+-------------+-------------+--------+--+

| 战狼2 | 2017-08-16 | 新片 |

| 三生三世十里桃花 | 2017-08-20 | 新片 |

| 羞羞的铁拳 | 2017-12-20 | 新片 |

| 人民的名义 | 2017-06-30 | 老片 |

+-------------+-------------+--------+--+

6.常见分组聚合函数

sum(字段) : 求这个字段在一个组中的所有值的和

avg(字段) ：求这个字段在一个组中的所有值的平均值

max(字段) ：求这个字段在一个组中的所有值的最大值

min(字段) ：求这个字段在一个组中的所有值的最小值

7.表生成函数

1.explode()

使用explode()对数组字段“炸裂”

select explode(starring) as actors

from t_movie;

+--------+--+

| movie |

+--------+--+

| 吴京 |

| 吴刚 |

| 卢靖姗 |

| 刘亦菲 |

| 杨洋 |

| 沈腾 |

| 玛丽 |

| 艾伦 |

| 陆毅 |

| 张丰毅 |

| 吴刚 |

| 许亚军 |

| 张志坚 |

+--------+--+

2. lateral view()

Select movie_name,tmp.actors,m_date

From t_movie lateral view explode(starring) tmp as actors;

示例表：

+-------------+-------------+-------------+--+

| movie_name | tmp.actors | m_date |

+-------------+-------------+-------------+--+

| 战狼2 | 吴京 | 2017-08-16 |

| 战狼2 | 吴刚 | 2017-08-16 |

| 战狼2 | 卢靖姗 | 2017-08-16 |

| 三生三世十里桃花 | 刘亦菲 | 2017-08-20 |

| 三生三世十里桃花 | 杨洋 | 2017-08-20 |

| 羞羞的铁拳 | 沈腾 | 2017-12-20 |

| 羞羞的铁拳 | 玛丽 | 2017-12-20 |

| 羞羞的铁拳 | 艾伦 | 2017-12-20 |

| 人民的名义 | 陆毅 | 2017-06-30 |

| 人民的名义 | 张丰毅 | 2017-06-30 |

| 人民的名义 | 吴刚 | 2017-06-30 |

| 人民的名义 | 许亚军 | 2017-06-30 |

| 人民的名义 | 张志坚 | 2017-06-30 |

+-------------+-------------+-------------+--+

理解： lateral view 相当于两个表在join

左表：是原表

右表：是explode(某个集合字段)之后产生的表

而且：这个join只在同一行的数据间进行

2. 将分散表按照name分组聚合成一行

select

name, concat_ws('_',collect_set(project)) as project

from project_promotion

group by name;

+-------------------------+----------------------------+--+

| project_promotion.name | project_promotion.project |

+-------------------------+----------------------------+--+

| 小明 | 语文 |

| 小明 | 数学 |

| 小明 | 英语 |

| 小明 | 物理 |

| 小明 | 化学 |

| 小红 | 生物 |

| 小红 | 历史 |

| 小红 | 生化 |

| 小红 | 政治 |

+-------------------------+----------------------------+--+

结果表

+-------+-----------------+--+

| name | project |

+-------+-----------------+--+

| 小明 | 语文_数学_英语_物理_化学 |

| 小红 | 生物_历史_生化_政治 |

+-------+-----------------+--+

3. json解析函数：表生成函数

1.Json解析到本地：

Insert overwrite local directory ‘/root/hivedata/’

Row fromat delimited fields terminated by ’\t’

Select

Json_tuple(json,"movie","rate","timeStamp","uid") as(movie,rate,ts,uid1)

From t_rate;

3. json解析到HDFS

Insert overwrite directory ‘/root/hivedata/’

Select

Json_tuple(json,"movie","rate","timeStamp","uid") as(movie,rate,ts,uid1)

From t_rate;

+----------------------------------------------------------+--+

| t_json.json |

+----------------------------------------------------------+--+

| {"id":1,"name":"zhangsan1","age":18,"city":"beijing"} |

| {"id":2,"name":"zhangsan2","age":28,"city":"shanghai"} |

| {"id":3,"name":"zhangsan3","age":38,"city":"shanghai"} |

| {"id":4,"name":"zhangsan4","age":17,"city":"beijing"} |

| {"id":5,"name":"zhangsan5","age":19,"city":"beijing"} |

| {"id":6,"name":"zhangsan6","age":26,"city":"guangzhou"} |

| {"id":7,"name":"zhangsan7","age":17,"city":"beijing"} |

| {"id":8,"name":"zhangsan8","age":14,"city":"guangzhou"} |

| {"id":9,"name":"zhangsan9","age":48,"city":"beijing"} |

+----------------------------------------------------------+--+

需要做各种统计分析：

发现，直接对json做sql查询不方便，需要将json数据解析成普通的结构化数据表。可以采用hive中内置的json_tuple()函数

实现步骤：

1、创建一个原始表用来对应原始的json数据

create table t_json(json string);

load data local inpath ‘/root/rating.json’ into table t_json;

Select

Json_tuple(“id”,”name”,”age”,”city”) as (id,name,age,city)

From t_json;

+-----+------------+------+------------+--+

| id | name | age | city |

+-----+------------+------+------------+--+

| 1 | zhangsan1 | 18 | beijing |

| 2 | zhangsan2 | 28 | shanghai |

| 3 | zhangsan3 | 38 | shanghai |

| 4 | zhangsan4 | 17 | beijing |

| 5 | zhangsan5 | 19 | beijing |

| 6 | zhangsan6 | 26 | guangzhou |

| 7 | zhangsan7 | 17 | beijing |

| 8 | zhangsan8 | 14 | guangzhou |

| 9 | zhangsan9 | 48 | beijing |

+-----+------------+------+------------+--+

8. 窗口分析函数

1.row_number() over() TOPN

原表：

+-----+------+-------+-------+-------+--+

+-----+------+-------+-------+-------+--+

| 4 | 16 | d | 上海 | 1 |

| 4 | 16 | d | 上海 | 2 |

| 3 | 22 | c | 上海 | 3 |

| 3 | 22 | c | 上海 | 4 |

| 9 | 26 | f | 上海 | 5 |

| 9 | 26 | f | 上海 | 6 |

| 6 | 26 | f | 上海 | 7 |

| 6 | 26 | f | 上海 | 8 |

| 10 | 14 | b | 北京 | 1 |

| 10 | 14 | b | 北京 | 2 |

| 1 | 18 | a | 北京 | 3 |

| 1 | 18 | a | 北京 | 4 |

| 2 | 19 | b | 北京 | 5 |

| 2 | 19 | b | 北京 | 6 |

| 7 | 16 | d | 深圳 | 1 |

| 7 | 16 | d | 深圳 | 2 |

| 8 | 30 | e | 深圳 | 3 |

| 5 | 30 | e | 深圳 | 4 |

| 5 | 30 | e | 深圳 | 5 |

| 8 | 30 | e | 深圳 | 6 |

+-----+------+-------+-------+-------+--+

//查询每个地区年龄最大的两条

Select id,age,name,addr

from

(Select

Id,age,name,addr,row_number()over(partition by addr order by age desc) as rown

From t_rown)o1

Where o1.rown<=2;

示例表：

+-----+------+-------+-------+--+

| id | age | name | addr |

+-----+------+-------+-------+--+

| 6 | 26 | f | 上海 |

| 9 | 26 | f | 上海 |

| 2 | 19 | b | 北京 |

| 8 | 30 | e | 深圳 |

| 5 | 30 | e | 深圳 |

+-----+------+-------+-------+--+

2.sum() over()——级联求和

重要提示

sum()over()的累加范围指定语法：

sum() over(partition by x order by y rows between 8 preceding and current row)

sum() over(partition by x order by y rows between 8 preceding and 5 following)

sum() over(partition by x order by y rows between unbounded preceding and 5 following)

sum() over(partition by x order by y rows between unbounded preceding and unbounded following)

select name,month,amt,

sum(amt) over(partition by name order by month rows between unbounded preceding and current row) as accumulate

from t_tmp;

hive 窗口分析函数

select * from t_access;

+----------------+---------------------------------+-----------------------+--------------+--+

+----------------+---------------------------------+-----------------------+--------------+--+

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 20170804 |

| 192.168.33.3 | http://www.edu360.cn/teach | 2017-08-04 15:35:20 | 20170804 |

| 192.168.33.4 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 20170804 |

| 192.168.33.4 | http://www.edu360.cn/job | 2017-08-04 16:30:20 | 20170804 |

| 192.168.33.5 | http://www.edu360.cn/job | 2017-08-04 15:40:20 | 20170804 |

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 20170805 |

| 192.168.44.3 | http://www.edu360.cn/teach | 2017-08-05 15:35:20 | 20170805 |

| 192.168.33.44 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 20170805 |

| 192.168.33.46 | http://www.edu360.cn/job | 2017-08-05 16:30:20 | 20170805 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-05 15:40:20 | 20170805 |

| 192.168.133.3 | http://www.edu360.cn/register | 2017-08-06 15:30:20 | 20170806 |

| 192.168.111.3 | http://www.edu360.cn/register | 2017-08-06 15:35:20 | 20170806 |

| 192.168.34.44 | http://www.edu360.cn/pay | 2017-08-06 15:30:20 | 20170806 |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 20170806 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 20170806 |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 20170806 |

| 192.168.33.25 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 20170806 |

| 192.168.33.36 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 20170806 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 20170806 |

+----------------+---------------------------------+-----------------------+--------------+--+

## LAG() over() 函数

求每个人的每次访问记录后加上他的上次访问时间

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

lag(access_time,1,0) over(partition by ip order by access_time)as last_access_time

from t_access;

+----------------+---------------------------------+----------------------+-----+----------------------+--+

| ip | url | access_time | rn | last_access_time |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

| 192.168.111.3 | http://www.edu360.cn/register | 2017-08-06 15:35:20 | 1 | 0 |

| 192.168.133.3 | http://www.edu360.cn/register | 2017-08-06 15:30:20 | 1 | 0 |

| 192.168.33.25 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 1 | 0 |

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 1 | 0 |

| 192.168.33.3 | http://www.edu360.cn/teach | 2017-08-04 15:35:20 | 2 | 2017-08-04 15:30:20 |

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 3 | 2017-08-04 15:35:20 |

| 192.168.33.36 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 1 | 0 |

| 192.168.33.4 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 1 | 0 |

| 192.168.33.4 | http://www.edu360.cn/job | 2017-08-04 16:30:20 | 2 | 2017-08-04 15:30:20 |

| 192.168.33.44 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 1 | 0 |

| 192.168.33.46 | http://www.edu360.cn/job | 2017-08-05 16:30:20 | 1 | 0 |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 2 | 2017-08-05 16:30:20 |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 3 | 2017-08-06 16:30:20 |

| 192.168.33.5 | http://www.edu360.cn/job | 2017-08-04 15:40:20 | 1 | 0 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-05 15:40:20 | 1 | 0 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 2 | 2017-08-05 15:40:20 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 3 | 2017-08-06 15:40:20 |

| 192.168.34.44 | http://www.edu360.cn/pay | 2017-08-06 15:30:20 | 1 | 0 |

| 192.168.44.3 | http://www.edu360.cn/teach | 2017-08-05 15:35:20 | 1 | 0 |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

## LEAD() over() 函数

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

lead(access_time,1,0) over(partition by ip order by access_time)as pre_access_time

from t_access;

+----------------+---------------------------------+----------------------+-----+----------------------+--+

| ip | url | access_time | rn | pre_access_time |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

| 192.168.111.3 | http://www.edu360.cn/register | 2017-08-06 15:35:20 | 1 | 0 |

| 192.168.133.3 | http://www.edu360.cn/register | 2017-08-06 15:30:20 | 1 | 0 |

| 192.168.33.25 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 1 | 0 |

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 1 | 2017-08-04 15:35:20 |

| 192.168.33.3 | http://www.edu360.cn/teach | 2017-08-04 15:35:20 | 2 | 2017-08-05 15:30:20 |

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 3 | 0 |

| 192.168.33.36 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 1 | 0 |

| 192.168.33.4 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 1 | 2017-08-04 16:30:20 |

| 192.168.33.4 | http://www.edu360.cn/job | 2017-08-04 16:30:20 | 2 | 0 |

| 192.168.33.44 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 1 | 0 |

| 192.168.33.46 | http://www.edu360.cn/job | 2017-08-05 16:30:20 | 1 | 2017-08-06 16:30:20 |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 2 | 2017-08-06 16:30:20 |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 3 | 0 |

| 192.168.33.5 | http://www.edu360.cn/job | 2017-08-04 15:40:20 | 1 | 0 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-05 15:40:20 | 1 | 2017-08-06 15:40:20 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 2 | 2017-08-06 15:40:20 |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 3 | 0 |

| 192.168.34.44 | http://www.edu360.cn/pay | 2017-08-06 15:30:20 | 1 | 0 |

| 192.168.44.3 | http://www.edu360.cn/teach | 2017-08-05 15:35:20 | 1 | 0 |

+----------------+---------------------------------+----------------------+-----+----------------------+--+

## FIRST_VALUE() over() 函数

例：取每个用户访问的第一个页面

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

first_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time

from t_access;

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

| ip | url | access_time | rn | last_access_time |

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

| 192.168.111.3 | http://www.edu360.cn/register | 2017-08-06 15:35:20 | 1 | http://www.edu360.cn/register |

| 192.168.133.3 | http://www.edu360.cn/register | 2017-08-06 15:30:20 | 1 | http://www.edu360.cn/register |

| 192.168.33.25 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 1 | http://www.edu360.cn/job |

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 1 | http://www.edu360.cn/stu |

| 192.168.33.3 | http://www.edu360.cn/teach | 2017-08-04 15:35:20 | 2 | http://www.edu360.cn/stu |

| 192.168.33.3 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 3 | http://www.edu360.cn/stu |

| 192.168.33.36 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 1 | http://www.edu360.cn/excersize |

| 192.168.33.4 | http://www.edu360.cn/stu | 2017-08-04 15:30:20 | 1 | http://www.edu360.cn/stu |

| 192.168.33.4 | http://www.edu360.cn/job | 2017-08-04 16:30:20 | 2 | http://www.edu360.cn/stu |

| 192.168.33.44 | http://www.edu360.cn/stu | 2017-08-05 15:30:20 | 1 | http://www.edu360.cn/stu |

| 192.168.33.46 | http://www.edu360.cn/job | 2017-08-05 16:30:20 | 1 | http://www.edu360.cn/job |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 2 | http://www.edu360.cn/job |

| 192.168.33.46 | http://www.edu360.cn/excersize | 2017-08-06 16:30:20 | 3 | http://www.edu360.cn/job |

| 192.168.33.5 | http://www.edu360.cn/job | 2017-08-04 15:40:20 | 1 | http://www.edu360.cn/job |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-05 15:40:20 | 1 | http://www.edu360.cn/job |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 2 | http://www.edu360.cn/job |

| 192.168.33.55 | http://www.edu360.cn/job | 2017-08-06 15:40:20 | 3 | http://www.edu360.cn/job |

| 192.168.34.44 | http://www.edu360.cn/pay | 2017-08-06 15:30:20 | 1 | http://www.edu360.cn/pay |

| 192.168.44.3 | http://www.edu360.cn/teach | 2017-08-05 15:35:20 | 1 | http://www.edu360.cn/teach |

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

## LAST_VALUE() over() 函数

例：取每个用户访问的最后一个页面

select ip,url,access_time,

row_number() over(partition by ip order by access_time) as rn,

last_value(url) over(partition by ip order by access_time rows between unbounded preceding and unbounded following)as last_access_time

from t_access;

+----------------+---------------------------------+----------------------+-----+---------------------------------+--+

| ip | url | access_time | rn | last_access_url |