搭建Hive环境

sun_xo

已于 2022-06-06 17:05:34 修改

阅读量262

点赞数

文章标签： hive hadoop

于 2022-06-02 05:58:25 首次发布

本文链接：https://blog.csdn.net/sun_xo/article/details/125093757

版权

0) Prerequisite
## hadoop has been started

1) Build and deploy
## download hive-2.3.9-src.tar.gz from http://archive.apache.org/dist/
## unpack to ~/work/hive-src-2.3.9
$ cd ~/work/hive-src-2.3.9
$ mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true
## unpack packaging/target/apache-hive-2.3.9-bin.tar.gz to ~/work/hive-2.3.9

2) Configure
## set log
$ cd ~/work/hive-2.3.9/conf
$ diff hive-log4j2.properties.template hive-log4j2.properties

--- hive-log4j2.properties.template	2021-06-02 04:43:12.000000000 +0800
+++ hive-log4j2.properties	2022-06-01 11:40:54.000000000 +0800
@@ -21,7 +21,8 @@
 # list of properties
 property.hive.log.level = INFO
 property.hive.root.logger = DRFA
-property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
+# property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
+property.hive.log.dir = /Users/sun_xo/work/hive-2.3.9/logs/
 property.hive.log.file = hive.log
 property.hive.perflogger.log.level = INFO

## specify metastore db as mysql
$ cat hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
  </property>
  <!-- property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property -->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>manga</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>manga</value>
  </property>
</configuration>

## create schema in metastore
$ cd ~/work/hive-2.3.9/
$ bin/schematool -initSchema -dbType mysql

3) run a case
## get test data
$ cd ~/work/hive-2.3.9/examples
$ curl --remote-name http://files.grouplens.org/datasets/movielens/ml-100k.zip
$ unzip ml-100k.zip # unpack to ml-100k## create a table —— actually a directory /user/hive/warehouse/u_data on hdfs
$ cd ~/work/hive-2.3.9
$ bin/hive # with "-hiveconf hive.root.logger=DEBUG,console" for debug

hive>
    > CREATE TABLE u_data (
    >   userid INT,
    >   movieid INT,
    >   rating INT,
    >   unixtime STRING)
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY '\t'
    > STORED AS TEXTFILE;
OK
Time taken: 4.165 seconds

hive> show tables;
OK
u_data
Time taken: 0.119 seconds, Fetched: 1 row(s)

hive> describe u_data;
OK
userid                  int
movieid                 int
rating                  int
unixtime                string
Time taken: 0.097 seconds, Fetched: 4 row(s)

## load test data to hdfs

hive> load data local inpath 'examples/ml-100k/u.data' overwrite into table u_data;
Loading data to table default.u_data
OK
Time taken: 0.808 seconds

## count by using mapreduce

hive> select count(*) from u_data;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = sun_xo_20220601145026_9e4c291a-1b71-4b5a-9f13-bb2c850fe94a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1653709899071_0002, Tracking URL = http://localhost:8088/proxy/application_1653709899071_0002/
Kill Command = /Users/sun_xo/work/hadoop/bin/hadoop job  -kill job_1653709899071_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-06-01 14:50:32,670 Stage-1 map = 0%,  reduce = 0%
2022-06-01 14:50:38,926 Stage-1 map = 100%,  reduce = 0%
2022-06-01 14:50:43,056 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_1653709899071_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   HDFS Read: 1987035 HDFS Write: 106 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
100000
Time taken: 17.988 seconds, Fetched: 1 row(s)

4) Setup remote metastore and hiveserver2
## modify hive-site.xml
$ cd ~/work/hive-2.3.9
$ cat conf/hive-site.xml

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  ......
  <!-- HiveServer2 -->
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>localhost</value>
  </property>
  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
  </property>
  <!-- Remote MetaStore -->
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://localhost:9083</value>
  </property>
</configuration>

## start servers
$ nohup bin/hive --service metastore &
$ nohup bin/hive --service hiveserver2 &

reference: https://cwiki.apache.org/confluence/display/Hive//GettingStarted#GettingStarted-SimpleExampleUseCases