将hive的元数据存到Oracle里

最新推荐文章于 2021-07-05 17:50:45 发布

needle2

最新推荐文章于 2021-07-05 17:50:45 发布

阅读量1w

点赞数

分类专栏： hadoop 文章标签： oracle table schema compression hadoop variables

hadoop 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

感觉那个derby用不来，，故想将元数据放置到oracle里面【也可存到mysql】。至少知道怎么备份它。。。

下面是研究了一晚上的东东，人太笨了。多谢国宝斑猪、
=========================================

How to store Hive's metadata in Oracle

hive by default uses an embedded derby database. In real-world scenarios which multiple hive queries are executed using multiple session, a database server like MySQL or Oracle or derby server is required. For my example, i will use Oracle. Before doing that, make sure cluster is down.

1)create oracle user
sql> create user hive identified by hive identified by hive

 default tablespace hive;

sql> grant dba to hive; [notice security]

2)edit hive/conf/hive-default.xml
1 <?xml version="1.0"?>
2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3
4 <configuration>
5
6 
7 
8 
9 
10 
11
12 
13 <property>
14 <name>mapred.reduce.tasks</name>
15 <value>-1</value>
16 <description>The default number of reduce tasks per job. Typically set
17 to a prime close to the number of available hosts. Ignored when
18 mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value.
19 By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
20 </description>
21 </property>
22
23 <property>
24 <name>hive.exec.reducers.bytes.per.reducer</name>
25 <value>1000000000</value>
26 <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description>
27 </property>
28
29 <property>
30 <name>hive.exec.reducers.max</name>
31 <value>999</value>
32 <description>max number of reducers will be used. If the one
33 specified in the configuration parameter mapred.reduce.tasks is
34 negative, hive will use this one as the max number of reducers when
35 automatically determine number of reducers.</description>
36 </property>
37
38 <property>
39 <name>hive.exec.scratchdir</name>
40 <value>/tmp/hive-${user.name}</value>
41 <description>Scratch space for Hive jobs</description>
42 </property>
43
44 <property>
45 <name>hive.test.mode</name>
46 <value>false</value>
47 <description>whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename</description>
48 </property>
49
50 <property>
51 <name>hive.test.mode.prefix</name>
52 <value>test_</value>
53 <description>if hive is running in test mode, prefixes the output table by this string</description>
54 </property>
55
56 
57 
58 
59 
60 
61 
62 
63 <property>
64 <name>hive.test.mode.samplefreq</name>
65 <value>32</value>
66 <description>if hive is running in test mode and table is not bucketed, sampling frequency</description>
67 </property>
68
69 <property>
70 <name>hive.test.mode.nosamplelist</name>
71 <value></value>
72 <description>if hive is running in test mode, dont sample the above comma seperated list of tables</description>
73 </property>
74
75 <property>
76 <name>hive.metastore.local</name>
77 <value>true</value>
78 <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
79 </property>
80
81 <property>
82 <name>javax.jdo.option.ConnectionURL</name>
83 <value>jdbc:oracle:thin:@192.168.1.101:1521/ORCL</value>
84 <description>JDBC connect string for a JDBC metastore</description>
85 </property>
86
87 <property>
88 <name>javax.jdo.option.ConnectionDriverName</name>
89 <value>oracle.jdbc.driver.OracleDriver</value> 【就是第三步下载的ojdbc.jar】
90 <description>Driver class name for a JDBC metastore</description>
91 </property>
92
93 <property>
94 <name>javax.jdo.PersistenceManagerFactoryClass</name>
95 <value>org.datanucleus.jdo.JDOPersistenceManagerFactory</value>
96 <description>class implementing the jdo persistence</description>
97 </property>
98
99 <property>
100 <name>javax.jdo.option.DetachAllOnCommit</name>
101 <value>true</value>
102 <description>detaches all objects from session so that they can be used after transaction is committed</description>
103 </property>
104
105 <property>
106 <name>javax.jdo.option.NonTransactionalRead</name>
107 <value>true</value>
108 <description>reads outside of transactions</description>
109 </property>
110
111 <property>
112 <name>javax.jdo.option.ConnectionUserName</name>
113 <value>hive</value> [用户]
114 <description>username to use against metastore database</description>
115 </property>
116
117 <property>
118 <name>javax.jdo.option.ConnectionPassword</name>
119 <value>hive</value> [密码]
120 <description>password to use against metastore database</description>
121 </property>
122
123 <property>
124 <name>datanucleus.validateTables</name>
125 <value>false</value>
126 <description>validates existing schema against code. turn this on if you want to verify existing schema </description>
127 </property>
128
129 <property>
130 <name>datanucleus.validateColumns</name>
131 <value>false</value>
132 <description>validates existing schema against code. turn this on if you want to verify existing schema </description>
133 </property>
134
135 <property>
136 <name>datanucleus.validateConstraints</name>
137 <value>false</value>
138 <description>validates existing schema against code. turn this on if you want to verify existing schema </description>
139 </property>
140
141 <property>
142 <name>datanucleus.storeManagerType</name>
143 <value>rdbms</value>
144 <description>metadata store type</description>
145 </property>
146
147 <property>
148 <name>datanucleus.autoCreateSchema</name>
149 <value>true</value>
150 <description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
151 </property>
152
153 <property>
154 <name>datanucleus.autoStartMechanismMode</name>
155 <value>checked</value>
156 <description>throw exception if metadata tables are incorrect</description>
157 </property>
158
159 <property>
160 <name>datancucleus.transactionIsolation</name>
161 <value>read-committed</value>
162 <description></description>
163 </property>
164
165 <property>
166 <name>datanuclues.cache.level2</name>
167 <value>true</value>
168 <description>use a level 2 cache. turn this off if metadata is changed independently of hive metastore server</description>
169 </property>
170
171 <property>
172 <name>datanuclues.cache.level2.type</name>
173 <value>SOFT</value>
174 <description>SOFT=soft reference based cache, WEAK=weak reference based cache.</description>
175 </property>
176
177 <property>
178 <name>hive.metastore.warehouse.dir</name>
179 <value>/user/hive/warehouse</value>
180 <description>location of default database for the warehouse</description>
181 </property>
182
183 <property>
184 <name>hive.metastore.connect.retries</name>
185 <value>5</value>
186 <description>Number of retries while opening a connection to metastore</description>
187 </property>
188
189 <property>
190 <name>hive.metastore.rawstore.impl</name>
191 <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
192 <description>Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database</description>
193 </property>
194
195 <property>
196 <name>hive.default.fileformat</name>
197 <value>TextFile</value>
198 <description>Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS <TEXTFILE|SEQUENCEFILE> to override</description>
199 </property>
200
201 <property>
202 <name>hive.map.aggr</name>
203 <value>true</value>
204 <description>Whether to use map-side aggregation in Hive Group By queries</description>
205 </property>
206
207 <property>
208 <name>hive.groupby.skewindata</name>
209 <value>false</value>
210 <description>Whether there is skew in data to optimize group by queries</description>
211 </property>
212
213 <property>
214 <name>hive.groupby.mapaggr.checkinterval</name>
215 <value>100000</value>
216 <description>Number of rows after which size of the grouping keys/aggregation classes is performed</description>
217 </property>
218
219 <property>
220 <name>hive.mapred.local.mem</name>
221 <value>0</value>
222 <description>For local mode, memory of the mappers/reducers</description>
223 </property>
224
225 <property>
226 <name>hive.map.aggr.hash.percentmemory</name>
227 <value>0.5</value>
228 <description>Portion of total memory to be used by map-side grup aggregation hash table</description>
229 </property>
230
231 <property>
232 <name>hive.map.aggr.hash.min.reduction</name>
233 <value>0.5</value>
234 <description>Hash aggregation will be turned off if the ratio between hash
235 table size and input rows is bigger than this number. Set to 1 to make sure
236 hash aggregation is never turned off.</description>
237 </property>
238
239 <property>
240 <name>hive.optimize.cp</name>
241 <value>true</value>
242 <description>Whether to enable column pruner</description>
243 </property>
244
245 <property>
246 <name>hive.optimize.ppd</name>
247 <value>true</value>
248 <description>Whether to enable predicate pushdown</description>
249 </property>
250
251 <property>
252 <name>hive.optimize.pruner</name>
253 <value>true</value>
254 <description>Whether to enable the new partition pruner which depends on predicate pushdown. If this is disabled,
255 the old partition pruner which is based on AST will be enabled.</description>
256 </property>
257
258 <property>
259 <name>hive.join.emit.interval</name>
260 <value>1000</value>
261 <description>How many rows in the right-most join operand Hive should buffer before emitting the join result. </description>
262 </property>
263
264 <property>
265 <name>hive.mapred.mode</name>
266 <value>nonstrict</value>
267 <description>The mode in which the hive operations are being performed. In strict mode, some risky queries are not allowed to run</description>
268 </property>
269
270 <property>
271 <name>hive.exec.script.maxerrsize</name>
272 <value>100000</value>
273 <description>Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity </description>
274 </property>
275
276 <property>
277 <name>hive.exec.compress.output</name>
278 <value>false</value>
279 <description> This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
280 </property>
281
282 <property>
283 <name>hive.exec.compress.intermediate</name>
284 <value>false</value>
285 <description> This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
286 </property>
287
288 <property>
289 <name>hive.hwi.listen.host</name>
290 <value>0.0.0.0</value>
291 <description>This is the host address the Hive Web Interface will listen on</description>
292 </property>
293
294 <property>
295 <name>hive.hwi.listen.port</name>
296 <value>9999</value>
297 <description>This is the port the Hive Web Interface will listen on</description>
298 </property>
299
300 <property>
301 <name>hive.hwi.war.file</name>
302 <value>${HIVE_HOME}/lib/hive-hwi.war</value>
303 <description>This is the WAR file with the jsp content for Hive Web Interface</description>
304 </property>
305
306 <property>
307 <name>hive.exec.pre.hooks</name>
308 <value></value>
309 <description>Pre Execute Hook for Tests</description>
310 </property>
311
312 <property>
313 <name>hive.merge.mapfiles</name>
314 <value>true</value>
315 <description>Merge small files at the end of a map-only job</description>
316 </property>
317
318 <property>
319 <name>hive.merge.mapredfiles</name>
320 <value>false</value>
321 <description>Merge small files at the end of any job(map only or map-reduce)</description>
322 </property>
323
324 <property>
325 <name>hive.heartbeat.interval</name>
326 <value>1000</value>
327 <description>Send a heartbeat after this interval - used by mapjoin and filter operators</description>
328 </property>
329
330 <property>
331 <name>hive.merge.size.per.task</name>
332 <value>256000000</value>
333 <description>Size of merged files at the end of the job</description>
334 </property>
335
336 <property>
337 <name>hive.script.auto.progress</name>
338 <value>false</value>
339 <description>Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker. </description>
340 </property>
341
342 </configuration>

3)download oracle jdbc：------- ojdbc-14.jar

http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/htdocs/jdbc_10201.html



4) test it ：hive> [root@boss ~]# hive
Hive history file=/tmp/root/hive_job_log_root_201001162330_322775424.txt
hive> show tables;
OK
Time taken: 6.008 seconds
hive> create table test(id bigint);
OK
Time taken: 0.495 seconds
hive>

5）login into oracle,check if autocreate some tables;

[oracle@boss ~]$ sqlplus hive/hive@192.168.1.101:1521/spgjmega

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Jan 16 23:57:37 2010

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> show user;
USER is "HIVE"
SQL> select * from tab;

TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
NUCLEUS_TABLES TABLE
A TABLE
DBS TABLE
SEQUENCE_TABLE TABLE
SERDES TABLE
TBLS TABLE
SDS TABLE
PARTITION_KEYS TABLE
COLUMNS TABLE
BUCKETING_COLS TABLE
SD_PARAMS TABLE

TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
SORT_COLS TABLE
SERDE_PARAMS TABLE
TABLE_PARAMS TABLE

14 rows selected.

SQL> set line 1000
SQL> r
1* select * from TBLS

TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME
---------- ----------- ---------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
1 1263655926 1 0 root 0 1 test

SQL>

======参考的链接==============

http://code.google.com/p/hacdb/wiki/HiveTry

http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos-5.aspx

http://www.jpox.org/docs/1_0/rdbms.html

http://bbs.hadoopor.com/thread-219-1-1.html

http://wiki.apache.org/hadoop/HiveDerbyServerMode

needle2

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
将hive的元数据存到Oracle里

 感觉那个derby用不来，，故想将元数据放置到oracle里面【也可存到mysql】。至少知道怎么备份它。。。 下面是研究了一晚上的东东，人太笨了。多谢国宝斑猪、 ========================================= How to store Hive's metadata in Oracle hive by default uses an embedded derby da
复制链接

扫一扫

专栏目录