0) Prerequisite
## hadoop has been started
1) Build and deploy
## download hive-2.3.9-src.tar.gz from http://archive.apache.org/dist/
## unpack to ~/work/hive-src-2.3.9
$ cd ~/work/hive-src-2.3.9
$ mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true
## unpack packaging/target/apache-hive-2.3.9-bin.tar.gz to ~/work/hive-2.3.9
2) Configure
## set log
$ cd ~/work/hive-2.3.9/conf
$ diff hive-log4j2.properties.template hive-log4j2.properties
--- hive-log4j2.properties.template 2021-06-02 04:43:12.000000000 +0800
+++ hive-log4j2.properties 2022-06-01 11:40:54.000000000 +0800
@@ -21,7 +21,8 @@
# list of properties
property.hive.log.level = INFO
property.hive.root.logger = DRFA
-property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
+# property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
+property.hive.log.dir = /Users/sun_xo/work/hive-2.3.9/logs/
property.hive.log.file = hive.log
property.hive.perflogger.log.level = INFO
## specify metastore db as mysql
$ cat hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
</property>
<!-- property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>manga</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>manga</value>
</property>
</configuration>
## create schema in metastore
$ cd ~/work/hive-2.3.9/
$ bin/schematool -initSchema -dbType mysql
3) run a case
## get test data
$ cd ~/work/hive-2.3.9/examples
$ curl --remote-name http://files.grouplens.org/datasets/movielens/ml-100k.zip
$ unzip ml-100k.zip # unpack to ml-100k## create a table —— actually a directory /user/hive/warehouse/u_data on hdfs
$ cd ~/work/hive-2.3.9
$ bin/hive # with "-hiveconf hive.root.logger=DEBUG,console" for debug
hive>
> CREATE TABLE u_data (
> userid INT,
> movieid INT,
> rating INT,
> unixtime STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 4.165 seconds
hive> show tables;
OK
u_data
Time taken: 0.119 seconds, Fetched: 1 row(s)
hive> describe u_data;
OK
userid int
movieid int
rating int
unixtime string
Time taken: 0.097 seconds, Fetched: 4 row(s)
## load test data to hdfs
hive> load data local inpath 'examples/ml-100k/u.data' overwrite into table u_data;
Loading data to table default.u_data
OK
Time taken: 0.808 seconds
## count by using mapreduce
hive> select count(*) from u_data;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = sun_xo_20220601145026_9e4c291a-1b71-4b5a-9f13-bb2c850fe94a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1653709899071_0002, Tracking URL = http://localhost:8088/proxy/application_1653709899071_0002/
Kill Command = /Users/sun_xo/work/hadoop/bin/hadoop job -kill job_1653709899071_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-06-01 14:50:32,670 Stage-1 map = 0%, reduce = 0%
2022-06-01 14:50:38,926 Stage-1 map = 100%, reduce = 0%
2022-06-01 14:50:43,056 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1653709899071_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 1987035 HDFS Write: 106 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
100000
Time taken: 17.988 seconds, Fetched: 1 row(s)
4) Setup remote metastore and hiveserver2
## modify hive-site.xml
$ cd ~/work/hive-2.3.9
$ cat conf/hive-site.xml
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
......
<!-- HiveServer2 -->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>localhost</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<!-- Remote MetaStore -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
</configuration>
## start servers
$ nohup bin/hive --service metastore &
$ nohup bin/hive --service hiveserver2 &