CDH5.0以后的oozie sharelib(共享库),在hdfs上的目录发生了较大的变化,4.x及之前,在hdfs目录/user/oozie/share/lib下,按组件名称,存放了多个组件的lib库包。
但从5.x起,在此目录下,又增加了一层目录,形式为lib_${timestamp},在其下,才存放了多个组件的lib库包。
这种设计带来了2个好处:
1、新老运行环境的隔离。
当需要更新sharelib包里的依赖时,旧的job仍可以使用旧sharelib包来运行完,而新的job可以使用最新的sharelib来执行。
2、无需重启即可使用重新修改过的sharelib包。
Oozie将按照如下规则来清理老的sharelib包(lib_${timestamp}):
1、保留时间参数 ShareLibService.temp.sharelib.retention.days 过期后,默认是7天。
2、总是找最新的日期目录。
*************************************************************************************************************************************
The internals of Oozie’s ShareLib have changed recently (reflected in CDH 5.0.0). Here’s what you need to know.
In a previous blog post about one year ago, I explained how to use the Apache Oozie ShareLib in CDH 4. Since that time, things have changed about the ShareLib in CDH 5 (particularly directory structure), so some of the previous information is now obsolete. (These changes went upstream under OOZIE-1619.)
In this post I’ll explain those changes. I recommend that you read the previous post for context first, because the reasoning behind the ShareLib (and other related information) is still relevant.
Directory Structure Changes
In CDH 4.x, the directory structure of the ShareLib looks like this:
1
2
3
|
/
user
/
oozie
/
share
/
lib
/
distcp
/
*
.
jar
/
hive
/
*
.
jar
/
.
.
.
|
In CDH 5.x, there’s now an additional level, which contains a timestamp:
1
2
3
|
/
user
/
oozie
/
share
/
lib
/
lib_20140311155426
/
distcp
/
*
.
jar
/
hive
/
*
.
jar
/
.
.
.
|
The location of the ShareLib is still specified by the oozie.service.WorkflowAppService.system.libpath
configuration property as before; the lib_<timestamp>
directories will be created under that, as seen in the above example.
As you may have guessed, there can actually be multiple iterations of the these lib_<timestamp>
directories; as you may have been able to tell from the above example, the timestamp format is yyyymmddhhmmss
. The main reason for that is to address the following scenario.
Suppose you have some jobs running that are using the ShareLib. You decide that you want to update the ShareLib, but you don’t want to wait for those jobs to finish. Previously, doing that could cause those jobs to fail because the distributed cache would get “confused”. With the new directory structure, that won’t happen — the already running jobs will continue to use the old ShareLib and any new jobs will use the latest ShareLib. On startup, Oozie will look for the newest lib_<timestamp>
directory and use that.
Another benefit of the new ShareLib directory structure is that you can actually update the ShareLib without restarting Oozie! We’ll see more about that later.
For reference, here’s what the (MRv2) ShareLib looks like in CDH 5.0.0. Note that other than version numbers and the lib_<timestamp>
directory, it looks similar to the CDH 4.1.2 ShareLib from the previous blog post.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
|
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
distcp
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
distcp
/
hadoop
-
distcp
-
2.3.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
distcp
/
oozie
-
sharelib
-
distcp
-
4.0.0
-
cdh5
.
0.0.jar
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
hcatalog
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
ST4
-
4.0.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
ant
-
1.8.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
ant
-
launcher
-
1.8.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
avro
-
1.7.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
avro
-
ipc
-
1.7.5
-
cdh5
.
0.0
-
tests
.
jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
avro
-
ipc
-
1.7.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
avro
-
mapred
-
1.7.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
bonecp
-
0.7.1.RELEASE.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
commons
-
compress
-
1.4.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
commons
-
httpclient
-
3.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
commons
-
io
-
2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
datanucleus
-
api
-
jdo
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
datanucleus
-
core
-
3.2.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
datanucleus
-
rdbms
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
groovy
-
all
-
2.1.6.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
ant
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
common
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
exec
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
hcatalog
-
core
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
hcatalog
-
pig
-
adapter
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
hcatalog
-
server
-
extensions
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
metastore
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
serde
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
hive
-
webhcat
-
java
-
client
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
jdo
-
api
-
3.0.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
jetty
-
6.1.14.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
jetty
-
util
-
6.1.26.cloudera.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
libfb303
-
0.9.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
log4j
-
1.2.16.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
netty
-
3.6.2.Final.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
oozie
-
hcatalog
-
0.12.0
-
cdh5
.
0.0.oozie
-
4.0.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
oozie
-
sharelib
-
hcatalog
-
4.0.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
paranamer
-
2.3.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
parquet
-
hadoop
-
bundle
-
1.2.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
servlet
-
api
-
2.5
-
20081211.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
servlet
-
api
-
2.5
-
6.1.14.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
slf4j
-
api
-
1.7.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
slf4j
-
log4j12
-
1.7.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
snappy
-
java
-
1.0.4.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
stax
-
api
-
1.0.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
velocity
-
1.7.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hcatalog
/
xz
-
1.0.jar
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
hive
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
ST4
-
4.0.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
ant
-
1.8.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
ant
-
launcher
-
1.8.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
antlr
-
2.7.7.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
antlr
-
runtime
-
3.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
avro
-
ipc
-
1.7.5
-
cdh5
.
0.0
-
tests
.
jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
avro
-
ipc
-
1.7.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
avro
-
mapred
-
1.7.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
bonecp
-
0.7.1.RELEASE.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
commons
-
compress
-
1.4.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
commons
-
httpclient
-
3.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
commons
-
io
-
2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
datanucleus
-
api
-
jdo
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
datanucleus
-
core
-
3.2.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
datanucleus
-
rdbms
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
derby
-
10.10.1.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
groovy
-
all
-
2.1.6.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
guava
-
11.0.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
ant
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
cli
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
common
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
contrib
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
exec
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
metastore
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
serde
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
service
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
shims
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
shims
-
0.23
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
shims
-
common
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
hive
-
shims
-
common
-
secure
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
httpclient
-
4.2.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
httpcore
-
4.2.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
jackson
-
core
-
asl
-
1.8.8.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
jackson
-
mapper
-
asl
-
1.8.8.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
jdo
-
api
-
3.0.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
jetty
-
util
-
6.1.26.cloudera.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
jline
-
0.9.94.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
jsr305
-
1.3.9.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
jta
-
1.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
libfb303
-
0.9.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
libthrift
-
0.9.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
log4j
-
1.2.16.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
netty
-
3.6.2.Final.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
oozie
-
sharelib
-
hive
-
4.0.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
parquet
-
hadoop
-
bundle
-
1.2.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
slf4j
-
api
-
1.7.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
slf4j
-
log4j12
-
1.7.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
stax
-
api
-
1.0.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
stringtemplate
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive
/
xz
-
1.0.jar
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
hive2
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
commons
-
cli
-
1.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
commons
-
codec
-
1.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
commons
-
io
-
2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
commons
-
lang
-
2.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
commons
-
logging
-
1.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
beeline
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
exec
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
jdbc
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
metastore
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
serde
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
service
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
shims
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
shims
-
0.23
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
shims
-
common
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
hive
-
shims
-
common
-
secure
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
jline
-
0.9.94.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
libfb303
-
0.9.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
libthrift
-
0.9.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
log4j
-
1.2.16.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
oozie
-
sharelib
-
hive2
-
4.0.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
slf4j
-
api
-
1.7.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
hive2
/
slf4j
-
log4j12
-
1.7.5.jar
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
mapreduce
-
streaming
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
mapreduce
-
streaming
/
hadoop
-
streaming
-
2.3.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
mapreduce
-
streaming
/
oozie
-
sharelib
-
streaming
-
4.0.0
-
cdh5
.
0.0.jar
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
oozie
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
oozie
/
json
-
simple
-
1.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
oozie
/
oozie
-
hadoop
-
utils
-
2.3.0
-
cdh5
.
0.0.oozie
-
4.0.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
oozie
/
oozie
-
sharelib
-
oozie
-
4.0.0
-
cdh5
.
0.0.jar
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
pig
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
ant
-
1.6.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
antlr
-
2.7.7.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
antlr
-
runtime
-
3.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
automaton
-
1.11
-
8.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
commons
-
collections
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
commons
-
el
-
1.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
commons
-
httpclient
-
3.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
commons
-
io
-
2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
core
-
3.1.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
findbugs
-
annotations
-
1.3.9
-
1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
guava
-
11.0.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
hbase
-
client
-
0.96.1.1
-
cdh5
.
0.0
-
tests
.
jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
hbase
-
client
-
0.96.1.1
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
hbase
-
common
-
0.96.1.1
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
hbase
-
protocol
-
0.96.1.1
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
hsqldb
-
1.8.0.10.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
htrace
-
core
-
2.01.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jansi
-
1.9.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jasper
-
compiler
-
5.5.23.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jasper
-
runtime
-
5.5.23.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jets3t
-
0.6.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jetty
-
6.1.14.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jetty
-
util
-
6.1.26.cloudera.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jline
-
0.9.94.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
joda
-
time
-
1.6.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jsch
-
0.1.42.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jsp
-
2.1
-
6.1.14.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jsp
-
api
-
2.1
-
6.1.14.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jsr305
-
1.3.9.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
jython
-
standalone
-
2.5.3.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
kfs
-
0.3.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
netty
-
3.6.6.Final.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
oozie
-
sharelib
-
pig
-
4.0.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
oro
-
2.0.8.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
parquet
-
pig
-
bundle
-
1.2.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
pig
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
protobuf
-
java
-
2.5.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
servlet
-
api
-
2.5
-
6.1.14.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
pig
/
stringtemplate
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sharelib
.
properties
drwxr
-
xr
-
x
share
/
lib
/
lib_20140403151601
/
sqoop
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
ST4
-
4.0.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
ant
-
1.8.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
ant
-
launcher
-
1.8.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
antlr
-
2.7.7.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
antlr
-
runtime
-
3.4.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
avro
-
ipc
-
1.7.5
-
cdh5
.
0.0
-
tests
.
jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
avro
-
ipc
-
1.7.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
avro
-
mapred
-
1.7.5
-
cdh5
.
0.0
-
hadoop2
.
jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
avro
-
mapred
-
1.7.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
bonecp
-
0.7.1.RELEASE.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
commons
-
compress
-
1.4.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
commons
-
io
-
2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
datanucleus
-
api
-
jdo
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
datanucleus
-
core
-
3.2.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
datanucleus
-
rdbms
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
findbugs
-
annotations
-
1.3.9
-
1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
groovy
-
all
-
2.1.6.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
guava
-
11.0.2.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hbase
-
common
-
0.96.1.1
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
ant
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
cli
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
common
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
exec
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
hcatalog
-
core
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
metastore
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
serde
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
service
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
shims
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
shims
-
0.23
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
shims
-
common
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hive
-
shims
-
common
-
secure
-
0.12.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
hsqldb
-
1.8.0.10.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
httpclient
-
4.2.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
httpcore
-
4.2.5.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
jdo
-
api
-
3.0.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
jline
-
0.9.94.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
jsr305
-
1.3.9.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
jta
-
1.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
libfb303
-
0.9.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
libthrift
-
0.9.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
netty
-
3.4.0.Final.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
oozie
-
sharelib
-
sqoop
-
4.0.0
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
parquet
-
hadoop
-
bundle
-
1.2.5
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
sqoop
-
1.4.4
-
cdh5
.
0.0.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
stringtemplate
-
3.2.1.jar
-
rw
-
r
--
r
--
share
/
lib
/
lib_20140403151601
/
sqoop
/
xz
-
1.0.jar
|
Oozie will automatically clean up old ShareLib lib_<timestamp>
directories based on the following rules:
- After
ShareLibService.temp.sharelib.retention.days
days (default: 7) - Will always keep the latest 2
Currently, Oozie checks for stale directories only at startup, which means that it won’t do any cleanup unless you restart Oozie. (We plan to improving this approach in a future release via OOZIE-1783.) In the meantime, you can safely delete older lib_<timestamp>
directories from HDFS, as long as no jobs are currently using them. For most CDH users, this shouldn’t be a huge problem because the ShareLib is typically only upgraded when upgrading CDH.
Installation Changes
In CDH 4.x, the installation directions said to essentially un-tar the ShareLib tarball and upload it to the /user/oozie/share/lib
directory in HDFS. In the conclusion of the previous blog post, I had mentioned that we’re always making improvements to the ShareLib, and specifically explained that OOZIE-1054 would provide a script that installs the ShareLib for you. While this feature has been supported since CDH 4.3, it wasn’t readily available.
In CDH 5.0.0, we’ve now made the script available and it is the recommended way to install the ShareLib. It handles un-tarring the ShareLib, creating the lib_<timestamp>
directory, and uploading it for you. This is very important: Simply uploading the ShareLib to /user/oozie/share/lib
will no longer work because Oozie won’t find the jars!
To use the script, you simply run this command:
1
|
oozie
-
setup
sharelib
create
-
fs
FS
_URI
[
-
locallib
SHARED_LIBRARY
]
|
where FS_URI
is the HDFS URI of the filesystem that the ShareLib should be installed (for example, hdfs://<HOST<<PORT>
) and the optional SHARED_LIBRARY
is the ShareLib tarball. In many cases the script will be able to find the ShareLib tarball for you, but in case it can’t or if you want to use a different one (say the MRv1 version), you can specify it here. (There’s also an oozie-setup sharelib upgrade
command with the same arguments, but it’s deprecated and currently does exactly the same thing as the create
command.)
More details on the installation procedure can be found in the documentation here. When upgrading from CDH 4.x to CDH 5.x, we recommend that you manually delete your ShareLib first so the CDH 4.x jars are not in the way.
Helpful Commands and Info
We’ve also added some additional ways to get information about the ShareLib.
The admin -shareliblist
command can be used to list the ShareLib contents without having to go into HDFS and figure out which ShareLib Oozie is currently using. For example:
1
2
3
4
5
6
7
8
9
10
|
$
oozie
admin
-
shareliblist
[
Available
ShareLib
]
oozie
hive
distcp
hcatalog
sqoop
mapreduce
-
streaming
hive2
pig
|
And you can get a list of the jars in each of the ShareLibs like this:
1
2
3
4
5
6
7
|
$
oozie
admin
-
shareliblist
pig
[
Available
ShareLib
]
pig
hdfs
:
//rkanter-has-1.ent.cloudera.com:8020/user/oozie/share/lib/lib_20140403151601/pig/ant-1.6.5.jar
hdfs
:
//rkanter-has-1.ent.cloudera.com:8020/user/oozie/share/lib/lib_20140403151601/pig/antlr-2.7.7.jar
hdfs
:
//rkanter-has-1.ent.cloudera.com:8020/user/oozie/share/lib/lib_20140403151601/pig/antlr-runtime-3.4.jar
.
.
.
|
I mentioned earlier that you can now actually update the ShareLib while Oozie is running. This can be done with the admin -sharelibupdate
command:
1
2
3
4
5
6
|
$
oozie
admin
-
sharelibupdate
[
ShareLib
update
status
]
host
=
rkanter
-
has
-
1.ent.cloudera.com
:
11000
status
=
Successful
sharelibDirOld
=
hdfs
:
//rkanter-has-1.ent.cloudera.com:8020/user/oozie/share/lib/lib_20140403151601
sharelibDirNew
=
hdfs
:
//rkanter-has-1.ent.cloudera.com:8020/user/oozie/share/lib/lib_20140425150458
|
As you can see, this caused Oozie to switch to the latest lib_<timestamp>
directory. This command also works with Oozie HA; if you run it once, each of the Oozie servers will look for the latest ShareLib.
You can also find info about the current ShareLib Oozie is using from the instrumentation log (which is also available on the Web UI, Hue, and REST API). The relevant properties are all under the "libs" group. You’ll mostly be interested in these:
sharelib.keys
: ShareLibs loaded (e.g. "oozie", "hive", "pig", etc)sharelib.source
: Indicates if thesystem.libpath
or the mapping file is being used. (We’ll look at the mapping file later.)sharelib.system.libpath
: Path to the currently loaded sharelib
Overriding the ShareLib
I mentioned this in the old blog post, but I wanted to point it out again: If you want to change which ShareLib Oozie is using, or include multiple ShareLibs, for a particular action or action type, you can do so. You may have noticed that there is now an “hcatalog” sharelib. There isn’t a new HCat action; this is a convenience for other actions that want to talk to HCat, and therefore need the HCatalog jars. A common use case is having Pig talk to HCatalog. You can override the ShareLib for an action type with the oozie.action.sharelib.for.#ACTIONTYPE#
property. It can go in an action’s <configuration>
section, job.properties
, or oozie-site.xml
and with priority in that order as well. For example, if you want all Pig actions in one of your Workflows to include the HCatalog ShareLib, you would add oozie.action.sharelib.for.pig=pig,hcatalog
to your job.properties
.
ShareLib Mapping File
Another new feature is the ability to specify a “mapping file” instead of using the location at oozie.service.WorkflowAppService.system.libpath
for the ShareLib. The location of the mapping file can be specified by oozie.service.ShareLibService.mapping.file
in oozie-site.xml
, which should contain a list of =
entries. For example:
1
2
3
|
oozie
.
pig_10
=
hdfs
:
///share/lib/pig/pig-0.10.1/lib/
oozie
.
pig
=
hdfs
:
///share/lib/pig/pig-0.11.1/lib/
oozie
.
distcp
=
hdfs
:
///share/lib/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-distcp-2.2.0.jar
|
Note that this is an advanced feature and completely optional. Most users should use the previously discussed ShareLib directories instead of the mapping file.
One Last Thing
I’ve seen a lot of confusion about how to include additional jars with your workflow and I’d like to use this opportunity to clarify. Below are the various ways to include a jar with your workflow:
- Set
oozie.libpath=/path/to/jars,another/path/to/jars
injob.properties
.- This is useful if you have many workflows that all need the same jar; you can put it in one place in HDFS and use it with many workflows. The jars will be available to all actions in that workflow.
- There is no need to ever point this at the ShareLib location. (I see that in a lot of workflows.) Oozie knows where the ShareLib is and will include it automatically if you set
oozie.use.system.libpath=true
injob.properties
.
- Create a directory named “lib” next to your
workflow.xml
in HDFS and put jars in there.- This is useful if you have some jars that you only need for one workflow. Oozie will automatically make those jars available to all actions in that workflow.
- Specify the
<archive>
tag in an action with the path to a single jar; you can have multiple<archive>
tags.- This is useful if you want some jars only for a specific action and not all actions in a workflow.
- The downside is that you have to specify them in your workflow.xml, so if you ever need to add/remove some jars, you have to change your
workflow.xml.
- Add jars to the ShareLib (e.g.
/user/oozie/share/lib/lib_<timestamp>/pig
)- While this will work, it’s not recommended for two reasons:
- The additional jars will be included with every workflow using that ShareLib, which may be unexpected to those workflows and users.
- When upgrading the ShareLib, you’ll have to recopy the additional jars to the new ShareLib.
- While this will work, it’s not recommended for two reasons:
Conclusion
At first, these changes may seem complicated and overwhelming. But just remember that, in a nutshell, all we did was add an extra level with a timestamp (the lib_<timestamp>
directory). The ShareLib still works the same way as before and you don’t have to update any of your workflows to continue using it. Other than the installation changes (which Cloudera Manager can handle for you), everything else is optional or provided to make things easier.
Further Reading
- CDH 5 Configuring Oozie Documentation
- Oozie Documentation on the new ShareLib admin commands (CLI)
- Oozie Documentation on the new ShareLib admin commands (REST)
Robert Kanter is a Software Engineer at Cloudera, and an Oozie Committer/PMC Member.