环境介绍
hadoop是cdh4.2.0的版本,搭建见十分钟搭建自己的hadoop2/CDH4集群
hive版本可以是cdh4.2.0的hive-0.10.0,下载包(win直接下载解压会失败,建议linux下wget下载)。也可以是hive-0.9.0(shark-0.7包里自带的amp实验室提供的版本)。两个版本我都尝试了,都是可以的,使用后者这个版本比较低的hive的原因是为了使用shark。
metastore则是一个MySQL,只有你有mysql server,随便create 一个 空的database即可。
Hive部署
1. conf下面的hive-default等配置不需要改变,增加hive-site.xml,为其添加如下基本配置:
dir相关:
- <property>
- <name>hive.metastore.warehouse.dir</name>
- <value>/home/mywork/work/data/hive/warehouse-${user.name}</value>
- <description>location of default database for the warehouse</description>
- </property>
- <property>
- <name>hive.exec.scratchdir</name>
- <value>/home/mywork/work/data/hive/scratch-${user.name}</value>
- <description>Scratch space for Hive jobs</description>
- </property>
- <property>
- <name>hive.querylog.location</name>
- <value>/home/mywork/work/data/querylog-${user.name}</value>
- <description>
- Location of Hive run time structured log file
- </description>
- </property>
- <property>
- <name>hive.hwi.listen.host</name>
- <value>10.65.17.192</value>
- <description>This is the host address the Hive Web Interface will listen on</description>
- </property>
- <property>
- <name>hive.hwi.listen.port</name>
- <value>9999</value>
- <description>This is the port the Hive Web Interface will listen on</description>
- </property>
- <property>
- <name>hive.hwi.war.file</name>
- <value>lib/hive-hwi-0.9.0-amplab-4.war</value>
- <description>This is the WAR file with the jsp content for Hive Web Interface</description>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionURL</name>
- <value>jdbc:mysql://xx.xx.xx.com:8306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=utf8
- </value>
- <description>JDBC connect string for a JDBC metastore</description>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionDriverName</name>
- <value>com.mysql.jdbc.Driver</value>
- <description>Driver class name for a JDBC metastore</description>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionUserName</name>
- <value>xxxx</value>
- <description>username to use against metastore database</description>
- </property>
- <property>
- <name>javax.jdo.option.ConnectionPassword</name>
- <value>xxxx</value>
- <description>password to use against metastore database</description>
- </property>
3. HADOOP_HOME如果没有在环境变量里的话,请在conf/hive-env.sh里export
3. 启动hive,可以用bin/hive,但是这样报错无法定位到具体原因,建议使用
- ./bin/hive -hiveconf hive.root.logger=DEBUG,console
Debug Hive
bin/hive初次启动hive,执行 select tables; 可能会遇到如下错误:
- FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
- FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
- ./bin/hive -hiveconf hive.root.logger=DEBUG,console
问题一:
- Transaction level-COMMITTED' in InnoDB is not safe for binlog mode 'STATEMENT'
- SET GLOBAL binlog_format = 'ROW';
问题二:
- MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes)
- alter database hive character set latin1;
如遇到其他问题,可能多数是和metastore有关,自行查阅之。
Test Hive
启动自己的hadoop dfs和yarn,启动hive,创建表:
- create table test(num int, name string);
- Field 'IS_STOREDASSUBDIRECTORIES' doesn't have a default value
- ALTER TABLE `SDS` ALTER `IS_STOREDASSUBDIRECTORIES` SET DEFAULT 0;
- load data local inpath '/home/mywork/work/hive-0.9.0-bin/examples/files/kv1.txt' into table test;
- select * from test where num > 400;
(我最简单对比了下,hive使用了14s,用shark仅1s,后续针对hive vs shark会有专门的博文)
(全文完)