Lateral View是Hive中提供给UDTF的conjunction,它可以解决UDTF不能添加额外的select列的问题。
1. Why we need Lateral View?
- select game_id, explode(split(user_ids,'\\[\\[\\[')) as user_id from login_game_log where dt='2014-05-15'
- FAILED: Error in semantic analysis: UDTF's are not supported outside the SELECT clause, nor nested in expressions。
提示语法分析错误,UDTF不支持函数之外的select 语句,真无语。。。
如果我们想支持怎么办呢?接下来就是Lateral View 登场的时候了。
2. Lateral View explain
2.1 单个Lateral View
Lateral view is used in conjunction with user-defined table generatingfunctions such as explode()
. As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows foreach input row. A lateral view first applies the UDTF to each row of base tableand then joins resulting output rows to the input rows to form a virtual tablehaving the supplied table alias.
Lateral view 其实就是用来和像类似explode这种UDTF函数联用的。lateral view 会将UDTF生成的结果放到一个虚拟表中,然后这个虚拟表会和输入行即每个game_id进行join 来达到连接UDTF外的select字段的目的。
Lateral View Syntax
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (
','
columnAlias)*
fromClause: FROM baseTable (lateralView)*
|
1. 在udtf前面用
2. 在from baseTable后面用
举个例子:
1. 先创建一个文件,里面2列用\t分割,game_id和user_ids
- hive> create table test_lateral_view_shengli(game_id string,userl_ids string) row format delimited fields terminated by '\t' stored as textfile;
- OK
- Time taken: 2.451 seconds
- hive> load data local inpath '/home/hadoop/test_lateral' into table test_lateral_view_shengli;
- Copying data from file:/home/hadoop/test_lateral
- Copying file: file:/home/hadoop/test_lateral
- Loading data to table dw.test_lateral_view_shengli
- OK
- Time taken: 6.716 seconds
- hive> select * from test_lateral_view_shengli;
- OK
- game101 15358083654[[[ab33787873[[[zjy18052480603[[[shlg1881826[[[lxqab110
- game66 winning1ren[[[13810537508
- game101 hu330602003[[[hu330602004[[[hu330602005[[[15967506560
下面使用lateral_view
- hive> select game_id, user_id
- > from test_lateral_view_shengli lateral view explode(split(userl_ids,'\\[\\[\\[')) snTable as user_id
- > ;
- Total MapReduce jobs = 1
- Launching Job 1 out of 1
- Number of reduce tasks is set to 0 since there's no reduce operator
- Starting Job = job_201403301416_445839, Tracking URL = http://10.1.9.10:50030/jobdetails.jsp?jobid=job_201403301416_445839
- Kill Command = /app/home/hadoop/src/hadoop-0.20.2-cdh3u5/bin/../bin/hadoop job -Dmapred.job.tracker=10.1.9.10:9001 -kill job_201403301416_445839
- 2014-05-16 17:39:19,108 Stage-1 map = 0%, reduce = 0%
- 2014-05-16 17:39:28,157 Stage-1 map = 100%, reduce = 0%
- 2014-05-16 17:39:38,830 Stage-1 map = 100%, reduce = 100%
- Ended Job = job_201403301416_445839
- OK
- game101 hu330602003
- game101 hu330602004
- game101 hu330602005
- game101 15967506560
- game101 15358083654
- game101 ab33787873
- game101 zjy18052480603
- game101 shlg1881826
- game101 lxqab110
- game66 winning1ren
- game66 13810537508
2.2 多个Lateral View
Array<int> col1 | Array<string> col2 |
[1, 2] | [a", "b", "c"] |
[3, 4] | [d", "e", "f"] |
int myCol1 | string myCol2 |
1 | "a" |
1 | "b" |
1 | "c" |
2 | "a" |
2 | "b" |
2 | "c" |
3 | "d" |
3 | "e" |
3 | "f" |
4 | "d" |
4 | "e" |
4 | "f" |
- SELECT myCol1, myCol2 FROM baseTable
- LATERAL VIEW explode(col1) myTable1 AS myCol1
- LATERAL VIEW explode(col2) myTable2 AS myCol2;
3. Outer Lateral View
- hive> select * FROM test_lateral_view_shengli LATERAL VIEW explode(array()) C AS a ;
- SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;
- 238 val_238 NULL
- 86 val_86 NULL
- 311 val_311 NULL
- 27 val_27 NULL
- 165 val_165 NULL
- 409 val_409 NULL
- 255 val_255 NULL
- 278 val_278 NULL
- 98 val_98 NULL
- ...