Hive安装教程
Hive不支持记录级别的更新、插入或者删除操作。Hive查询延时比较严重。Hive不支持事务。
Pig常用于ETL(数据抽取,数据转换和数据装载)过程的一部分,也就是将外部数据装载到Hadoop集群中,然后转换成所期望的数据格式。
大前提,已经装有hadoop,并且配置了环境变量$HADOOP_HOME
下载安装包
https://hive.apache.org/downloads.html
本文以apache-hive-2.3.3-bin.tar.gz为例
解压
Tar -zxvf apache-hive-2.3.3-bin.tar.gz
增加环境变量
在.bashrc或者profile文件中增加以下内容,本文在.bashrc文件中增加内容
export HIVE_HOME={{pwd}}
export PATH=$HIVE_HOME/bin:$PATH
其中{{pwd}}
为hive文件路径
在HDFS中建立文件夹tmp(一般默认就有)和warehouse
(In addition, you must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive.)
$ $HADOOP_HOME/bin/hdfs dfs -mkdir /tmp
$ $HADOOP_HOME/bin/hdfs dfs -mkdir -p /user/hive/warehouse
$ $HADOOP_HOME/bin/hdfs dfs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hdfs dfs -chmod g+w /user/hive/warehouse
在conf目录下新建hive-site.xml(参考hive-default.xml.template),增加连接数据库的配置,如下所示
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
初始化schema
bin/schematool -initSchema -dbType mysql
测试
使用bin/hive
进行hive的CLI,输入以下命令进行测试:
CREATE TABLE x (a INT);
SELECT * FROM x;
DROP TABLE x;
问题
bin/hive
启动后报如下错误
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:659)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 12 more
修改配置文件中:
<property>
<name>hive.querylog.location</name>
<value>${system:java.io.tmpdir}/${system:user.name}</value>
<description>Location of Hive run time structured log file</description>
</property>
<value>
中的值包含${system:java.io.tmpdir}
或${system:user.name}
修改为具体的路径