hive 部署及简单操作

部署hive

1)下载:http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz
2)解压: tar -xf hive-1.1.0-cdh5.7.0.tar.gz

[hadoop@hadoop001 software]$ tar -xf hive-1.1.0-cdh5.7.0.tar.gz
[hadoop@hadoop001 app]$ ln -s /home/hadoop/software/hive-1.1.0-cdh5.7.0 hive
[hadoop@hadoop001 app]$ ll
total 0
lrwxrwxrwx 1 hadoop hadoop 43 Jul  1 14:58 hadoop -> /home/hadoop/software/hadoop-2.6.0-cdh5.7.0
lrwxrwxrwx 1 hadoop hadoop 41 Jul 15 11:25 hive -> /home/hadoop/software/hive-1.1.0-cdh5.7.0
3)将HIVE_HOME配置到系统环境变量(~/.bash_profile)

[hadoop@hadoop001 data]$ cat ~/.bash_profile

PATH=$PATH:$HOME/bin
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${HADOOP_HOME}/bin:$PATH
export HIVE_HOME=/home/hadoop/app/hive
export PATH=${HIVE_HOME}/bin:$PATH
4)将MySQL驱动拷贝到$HIVE_HOME/lib

下载软件mysql-connector-java-5.1.27-bin.jar,并把文件拷贝到“/home/hadoop/app/hive/lib

[hadoop@hadoop001 lib]$ cp /home/hadoop/software/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar .

5)安装MySQL 
6)修改hive的配置文件 hive-site.xml

[hadoop@hadoop001 hive]$ cd conf/
[hadoop@hadoop001 conf]$ pwd
/home/hadoop/app/hive/conf
[hadoop@hadoop001 conf]$ ll
total 20
-rw-r--r-- 1 hadoop hadoop 1196 Mar 24  2016 beeline-log4j.properties.template
-rw-r--r-- 1 hadoop hadoop 2378 Mar 24  2016 hive-env.sh.template
-rw-r--r-- 1 hadoop hadoop 2662 Mar 24  2016 hive-exec-log4j.properties.template
-rw-r--r-- 1 hadoop hadoop 3505 Mar 24  2016 hive-log4j.properties.template
-rw-rw-r-- 1 hadoop hadoop  590 Jul 15 15:28 hive-site.xml
[hadoop@hadoop001 conf]$ cat hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost:3306/ruozedata_d7?createDatabaseIfNotExist=true</value>
 </property>
 <property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
 </property>
 <property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
 </property>
 <property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>123456</value>
 </property>
</configuration>

创建文件,将文件内容导入到HIVE数据仓库中

[hadoop@hadoop001 data]$ cat hello.txt
1,test
2,tom
3,xingxing

[hadoop@hadoop001 conf]$ hive
which: no hbase in (/home/hadoop/app/hive/bin:/home/hadoop/app/hadoop/bin:/home/hadoop/app/hadoop/bin:/usr/java/jdk1.8.0_45/bin:/usr/local/mysql/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin:/home/hadoop/bin)

Logging initialized using configuration in jar:file:/home/hadoop/software/hive-1.1.0-cdh5.7.0/lib/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show databases;
OK
default
Time taken: 0.911 seconds, Fetched: 1 row(s)
hive> create table hello(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.461 seconds
hive> show tables;
OK
hello
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> load data local inpath '/home/hadoop/data/hello.txt' overwrite into table hello;
Loading data to table default.hello
Table default.hello stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0]
OK
Time taken: 1.954 seconds
hive> show tables;
OK
hello
Time taken: 0.034 seconds, Fetched: 1 row(s)
hive> select * from hello;
OK
1    test
2    tom
3    xingxing
Time taken: 0.333 seconds, Fetched: 3 row(s)
hive> select count(*) from hello;
Query ID = hadoop_20190715143131_50ad76b6-f372-411c-a108-e4dc7c841782
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1562568082175_0002, Tracking URL = http://hadoop001:8088/proxy/application_1562568082175_0002/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1562568082175_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-07-15 15:42:35,083 Stage-1 map = 0%,  reduce = 0%
2019-07-15 15:42:58,048 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.4 sec
2019-07-15 15:43:08,013 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.16 sec
MapReduce Total cumulative CPU time: 8 seconds 160 msec
Ended Job = job_1562568082175_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.16 sec   HDFS Read: 6351 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 160 msec
OK
3
Time taken: 59.118 seconds, Fetched: 1 row(s)

下图:显示select count(*) from hello;  计算任务

 

默认存在hdfs 的路径

默认的Hive数据存放在HDFS上:/user/hive/warehouse    
hive.metastore.warehouse.dir Hive数据存放的HDFS的路径
如果要调整,就把这个参数设置到hive-site.xml中去
<property>
<name>hive.metastore.warehouse.dir</name>
<value>xxxx</value>
</property>

hive 默认属于default数据库
default数据库对应的目录就是hive.metastore.warehouse.dir
Hive中的表其实对应的就是HDFS上的一个目录,默认文件的名字就是tablename
异常信息在日志查找
hive详细日志:$HIVE_HOME/conf/hive-log4j.properties
hive.log.dir=${java.io.tmpdir}/${user.name}
hive.log.file=hive.log    

设置日志的配置文件

[hadoop@hadoop001 conf]$ pwd
/home/hadoop/app/hive/conf
[hadoop@hadoop001 conf]$ cp hive-log4j.properties.template hive-log4j.properties

日志文件存放位置

[hadoop@hadoop001 conf]$ cd /tmp/hadoop/
[hadoop@hadoop001 hadoop]$ ll
total 28
-rw-rw-r-- 1 hadoop hadoop  1067 Jul 16 11:12 hive.log
-rw-rw-r-- 1 hadoop hadoop 16749 Jul 15 15:43 hive.log.2019-07-15

设置变量

hive (default)> set hive.cli.print.current.db
              > ;
hive.cli.print.current.db=true
hive (default)> set hive.cli.print.current.db=false;
hive> set hive.cli.print.current.db;
hive.cli.print.current.db=false
hive.cli.print.current.db   是否暂时当前的数据库
hive.cli.print.header   查询输出的时候列的名字是否打印出来

创建数据库

hive (default)> create database d7_hive
              > ;
OK
Time taken: 3.354 seconds
hive (default)> show databases;
OK
database_name
d7_hive
default
Time taken: 0.416 seconds, Fetched: 2 row(s)
在HDFS 下面查找

[hadoop@hadoop001 ~]$ hdfs dfs -ls /user/hive/warehouse/
19/07/18 10:15:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2019-07-18 10:10 /user/hive/warehouse/d7_hive.db
drwxr-xr-x   - hadoop supergroup          0 2019-07-15 15:40 /user/hive/warehouse/hello

指定存放目录

hive (default)> create database IF NOT EXISTS d7_hive2 LOCATION '/d7_hive';
OK
Time taken: 0.88 seconds

[hadoop@hadoop001 ~]$ hdfs dfs -ls /
19/07/18 11:06:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2019-07-18 11:02 /d7_hive
在d7_hive 库下创建 表

hive (default)> use d7_hive
              > ;
OK
Time taken: 0.035 seconds
hive (d7_hive)> create table hello(id int,name string)
              > row format delimited fields terminated by ',';
OK
Time taken: 2.952 seconds
hive (d7_hive)> load data local inpath '/home/hadoop/data/hello.txt' overwrite into table hello;
Loading data to table d7_hive.hello
Table d7_hive.hello stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0]
OK
Time taken: 1.544 seconds
hive (d7_hive)> select * from hello;
OK
hello.id    hello.name
1    test
2    tom
3    xingxing
Time taken: 2.09 seconds, Fetched: 3 row(s)
查询MYSQL数据库

hadoop001:mysqladmin:/usr/local/mysql:>mysql -uroot -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 50
Server version: 5.6.23-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| ruozedata          |
| ruozedata_d7       |
| test               |
+--------------------+
6 rows in set (0.00 sec)

mysql> use ruozedata_d7;            ---切换到hive的元数据库
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_ruozedata_d7    |
+---------------------------+
| bucketing_cols            |
| cds                       |
| columns_v2                |
| database_params           |
| dbs                       |
| func_ru                   |
| funcs                     |
| global_privs              |
| part_col_stats            |
| partition_key_vals        |
| partition_keys            |
| partition_params          |
| partitions                |
| roles                     |
| sd_params                 |
| sds                       |
| sequence_table            |
| serde_params              |
| serdes                    |
| skewed_col_names          |
| skewed_col_value_loc_map  |
| skewed_string_list        |
| skewed_string_list_values |
| skewed_values             |
| sort_cols                 |
| tab_col_stats             |
| table_params              |
| tbls                      |
| version                   |
+---------------------------+
29 rows in set (0.00 sec)

mysql> select * from DBS \G;           ---查询数据库的库信息
*************************** 1. row ***************************
          DB_ID: 1
           DESC: Default Hive database
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse
           NAME: default
     OWNER_NAME: public
     OWNER_TYPE: ROLE
*************************** 2. row ***************************
          DB_ID: 6
           DESC: NULL
DB_LOCATION_URI: hdfs://hadoop001:9000/user/hive/warehouse/d7_hive.db         ---文件的位置
           NAME: d7_hive
     OWNER_NAME: hadoop
     OWNER_TYPE: USER
*************************** 3. row ***************************
          DB_ID: 7
           DESC: NULL
DB_LOCATION_URI: hdfs://hadoop001:9000/d7_hive
           NAME: d7_hive2
     OWNER_NAME: hadoop
     OWNER_TYPE: USER
3 rows in set (0.00 sec)
 

mysql> select * from tals \G;            --查询数据库的表信息
ERROR 1146 (42S02): Table 'ruozedata_d7.tals' doesn't exist
ERROR: 
No query specified

mysql> select * from tbls \G;
*************************** 1. row ***************************
            TBL_ID: 1
       CREATE_TIME: 1563176134
             DB_ID: 1
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 1
          TBL_NAME: hello
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
*************************** 2. row ***************************
            TBL_ID: 6
       CREATE_TIME: 1563419416
             DB_ID: 6
  LAST_ACCESS_TIME: 0
             OWNER: hadoop
         RETENTION: 0
             SD_ID: 6
          TBL_NAME: hello
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
2 rows in set (0.00 sec)

mysql> select * from columns_v2 \g;      --查询表的字段信息
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|     1 | NULL    | id          | int       |           0 |
|     1 | NULL    | name        | string    |           1 |
|     6 | NULL    | id          | int       |           0 |
|     6 | NULL    | name        | string    |           1 |
+-------+---------+-------------+-----------+-------------+
4 rows in set (0.00 sec)

mysql> select * from  version;                      ---表中有且只能存在一条记录
+--------+----------------+------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT                    |
+--------+----------------+------------------------------------+
|      1 | 1.1.0          | Set by MetaStore hadoop@10.0.0.200 |
+--------+----------------+------------------------------------+
1 row in set (0.00 sec)

删除数据库

hive (default)> show databases;
OK
database_name
d7_hive
d7_hive2
default
Time taken: 0.685 seconds, Fetched: 3 row(s)
hive (default)> drop database if exists d7_hive2;
OK
Time taken: 1.571 seconds
hive (default)> show databases;
OK
database_name
d7_hive
default
Time taken: 0.034 seconds, Fetched: 2 row(s)
级联删除

hive (default)> drop database if exists d7_hive cascade;
OK
Time taken: 2.393 seconds
创建表并导入数据

hive (default)> create table data_emp(
              > empno int,
              > ename string,
              > job string,
              > mgr int,
              > hiredate string,
              > sal double,
              > comm double,
              > deptno int
              > )
              > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
hive (default)> desc data_emp;
OK
col_name    data_type    comment
empno                   int                                         
ename                   string                                      
job                     string                                      
mgr                     int                                         
hiredate                string                                      
sal                     double                                      
comm                    double                                      
deptno                  int                                         
Time taken: 0.187 seconds, Fetched: 8 row(s)
hive (default)> desc formatted data_emp;

hive (default)> load data local inpath '/home/hadoop/data/emp.txt' overwrite into table data_emp;
Loading data to table default.data_emp
Table default.data_emp stats: [numFiles=1, numRows=0, totalSize=969, rawDataSize=0]
OK
Time taken: 2.287 seconds
hive (default)> select * from data_emp;
OK
data_emp.empno    data_emp.ename    data_emp.job    data_emp.mgr    data_emp.hiredate    data_emp.sal    data_emp.comdata_emp.deptno
7369     SMITH       CLERK         7902     1980-12-17 00:00:00     800.0    NULL    20
7499     ALLEN       SALESMAN      7698     1981-02-20 00:00:00     1600.0    300.0    30
7521     WARD        SALESMAN      7698     1981-02-22 00:00:00     1250.0    500.0    30
7566     JONES       MANAGER       7839     1981-04-02 00:00:00     2975.0    NULL    20
7654     MARTIN      SALESMAN      7698     1981-09-28 00:00:00     1250.0    1400.0    30
7698     BLAKE       MANAGER       7839     1981-05-01 00:00:00     2850.0    NULL    30
7782     CLARK       MANAGER       7839     1981-06-09 00:00:00     2450.0    NULL    10
7788     SCOTT       ANALYST       7566     1982-12-09 00:00:00     3000.0    NULL    20
7839     KING        PRESIDENT     NULL     1981-11-17 00:00:00     5000.0    NULL    10
7844     TURNER      SALESMAN      7698     1981-09-08 00:00:00     1500.0    0.0    30
7876     ADAMS       CLERK         7788     1983-01-12 00:00:00     1100.0    NULL    20
7900     JAMES       CLERK         7698     1981-12-03 00:00:00     950.0    NULL    30
7902     FORD        ANALYST       7566     1981-12-03 00:00:00     3000.0    NULL    20
7934     MILLER      CLERK         7782     1982-01-23 00:00:00     1300.0    NULL    10
Time taken: 0.774 seconds, Fetched: 14 row(s)
hive (default)> select deptno,count(1) cnts from data_emp group by deptno;
Query ID = hadoop_20190718184747_7f5d385d-ad0d-42b6-88f5-1f81a6f0d06c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1562568082175_0003, Tracking URL = http://hadoop001:8088/proxy/application_1562568082175_0003/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1562568082175_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-07-18 19:52:57,658 Stage-1 map = 0%,  reduce = 0%
2019-07-18 19:53:16,989 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.01 sec
2019-07-18 19:53:27,448 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.53 sec
MapReduce Total cumulative CPU time: 5 seconds 530 msec
Ended Job = job_1562568082175_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.53 sec   HDFS Read: 8321 HDFS Write: 15 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 530 msec
OK
deptno    cnts
10    3
20    5
30    6
Time taken: 56.323 seconds, Fetched: 3 row(s)
 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值