Day14_20180426_Hive元数据配置及开发使用

最新推荐文章于 2024-07-06 11:05:00 发布

黑冰vip

最新推荐文章于 2024-07-06 11:05:00 发布

阅读量238

点赞数 1

分类专栏： hive 文章标签：大数据

本文链接：https://blog.csdn.net/wanglvip/article/details/117186471

版权

hive 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

本文介绍了Hive的功能，包括将SQL转换为MapReduce任务，以及Hive与Hadoop、Metastore的关系。详细步骤展示了如何配置MySQL作为Hive的元数据库，涉及元数据存储、Driver组件、SQL解析和优化。此外，还涵盖了创建数据库、表，加载数据，以及常用SQL操作。最后，讨论了Hive的本地模式、日志配置和脚本执行，以及一些常用工具和函数。

摘要由CSDN通过智能技术生成

一、回顾
   -》hive的功能
       -》将SQL转换成了MapReduce程序提交给yarn去运行
       -》将HDFS上的文件映射成了表
   -》hive的组件
       -》Hadoop：
           -》存储：hdfs
           -》计算：MapReduce
       -》metastore：整个数据库中所有的scheme信息
           -》表与hdfs的映射
           -》文件与表的映射
           -》元数据存储位置
               -》默认在derby数据库
                   -》不能同时启动多个客户端
               -》推荐使用mysql来作为元数据库
       -》Driver
           -》客户端
           -》SQL解析器：将SQL进行解析
           -》语句优化器：根据SQL的解析，进行一定的优化
           -》物理计划

二、配置mysql作为hive的元数据库
hive元数据三种存储方式

  Derby
  本地mysql
  远程mysql：启动metastore服务

安装mysql(root用户)
1.安装mysql-server
       [rdedu@bigdata-training01 hive-1.2.1-bin]sudo yum install -y mysql-server
2.启动mysql服务，设置开机自启动
  sudo chkconfig mysqld on
           sudo service mysqld start
3.设置管理员用户的密码root
           mysqladmin -u root password '123456'
4.登录
           mysql -uroot -p
5.配置当前用户允许访问mysql数据库
           grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;
6.删除旧的配置
           show databases;
           use mysql;
           show tables;
           select user,host from user;
           delete from user where host='127.0.0.1';
           delete from user where host='localhost';
           delete from user where host='bigdata-training01.erongda.com';
7.刷新权限
           flush privileges;
8.重启mysql服务
           [rdedu@bigdata-training01 hive-1.2.1-bin]$ sudo service mysqld restart

配置hive使用mysql存储元数据
修改hive的配置文件

hive-site.xml

cp hive-default.xml.template hive-site.xml
vim hive-site.xml

           <!--指定mysql的地址-->
            <property>
               <name>javax.jdo.option.ConnectionURL</name>
               <value>jdbc:mysql://bigdata-training01.erongda.com:3306/metastore?createDatabaseIfNotExist=true</value>
            </property>
            <!--配置连接驱动-->
            <property>
               <name>javax.jdo.option.ConnectionDriverName</name>
               <value>com.mysql.jdbc.Driver</value>
            </property>
            <!--配置用户名和密码-->
            <property>
               <name>javax.jdo.option.ConnectionUserName</name>
               <value>root</value>
            </property>
            <property>
               <name>javax.jdo.option.ConnectionPassword</name>
               <value>123456</value>
            </property>

将jdbc的jar包放入hive的lib下

cp /opt/tools/mysql-connector-java-5.1.27-bin.jar lib/

启动测试

 bin/hive

给MySQL创建表

create database if not exists wordcount;
use wordcount;

drop table if exists wc;
create table wc(
id int,
word string
)
row format delimited fields terminated by '\t'
LINES TERMINATED BY '\n';

 [rdedu@bigdata-training01 hive-1.2.1-bin]$ mkdir logs

修改hive-log4j.propertis

 hive.log.dir=/opt/modules/hive-1.2.1-bin/logs
 hive.log.file=hive.log

验证:

  [rdedu@bigdata-training01 hive-1.2.1-bin]$ ll logs/
  total 4
  -rw-rw-r--. 1 rdedu rdedu 3056 Sep  4 21:38 hive.log

三、创建案例表，回顾常用的SQL

hive>

create database empManager;
use empManager;
create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
row format delimited fields terminated by '\t';

加载数据：
实例：load data [local] inpath 'file_path' into table tbname

hive> load data local inpath '/opt/datas/emp.txt' into table emp;
hive> select empno,ename,sal,comm,job from emp;

create table dept(
deptno int,
dname string,
loc string
)
row format delimited fields terminated by '\t';

将文件上传到hdfs的根目录上面

--加载数据就是将文件从hdfs的源目录移动到表的目录的过程

[rdedu@bigdata-training01 hadoop-2.7.3]$ bin/hdfs dfs -put /opt/datas/dept.txt  /
load data inpath '/dept.txt' into table dept;

常用SQL
创建表
三种方式

1、常规方式

drop table if exists tbname;
create table tbname (
col1 type,
col2 type,
……
)
row format delimited fields terminated by '\t'
line terminated by '\n'
stored as textfile;

2、子查询方式:as

create table tbname as select ……
--将子查询的结果保存为一张表
create table join_result as `x
select 
    a.empno,a.ename,a.sal,a.deptno,b.dname 
from 
    emp a
join
    dept b
on
    a.deptno = b.deptno;

3、like

create table tb1 like tb2
将tb2的表结构直接赋予tb1
create table join_result2 like join_result;

--Location：该关键字用于指定数据库、或者表对应的hdfs的路径
create table dept_location(
deptno int,
dname string,
loc string
)
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/dept_location';
load data local inpath '/opt/datas/dept.txt' into table dept_location;

常用函数:面试题: hive中比较常用的参数?

 rdbms：count，sum,avg
 hive：show functions;
        concat：用于拼接字符串
        substring：用于截取字符串
        split：分割字符串
        explode：用于实现一行转多行
        cast：用于数据类型强制转换
            cast(sal as string)

查看函数的用法

desc function cast;
desc function extended concat;

常用工具配置
测试：使用hive的本地模式
#不在提交给yarn,本地mapReduce执行

hive> set hive.exec.mode.local.auto=true;

 select 
       a.empno,a.ename,a.sal,a.deptno,b.dname 
 from 
       emp a
 join
       dept b
 on
       a.deptno = b.deptno;

显示当前数据库，显示行头

修改hive-site.xml

        <!--显示当前数据库和列的名称-->
        <property>
            <name>hive.cli.print.current.db</name>
            <value>true</value>
            <description>Whether to include the current database in the Hive prompt.</description>
         </property>
          <property>
                <name>hive.cli.print.header</name>
                <value>true</value>
                <description>Whether to print the names of the columns in query output.</description>
          </property>

修改后的效果：

 hive (empmanager)> select * from emp;
 OK
 emp.empno       emp.ename       emp.job emp.mgr emp.hiredate    emp.sal emp.comm        emp.deptno
 7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20

用于查看或者临时更改某个配置的值

set hive.cli.print.current.db;
set hive.cli.print.current.db=false;
更改立即生效，但只是临时有效，客户重启即失效

dfs
用于访问hdfs文件系统

hive>dfs -ls /;

hive>dfs -ls /;

Found 3 items
-rw-r--r--   1 rdedu supergroup         79 2019-09-04 22:03 /dept.txt
drwx-wx-wx   - rdedu supergroup          0 2019-09-04 22:15 /tmp
drwxr-xr-x   - rdedu supergroup          0 2019-09-03 22:34 /user

如何将hive封装到脚本里面？

 [rdedu@bigdata-training01 hive-1.2.1-bin]$ bin/hive -help

-e：使用hive执行一条SQL语句

 bin/hive -e "select * from empManager.emp"
 #保存文件hive.result中
 bin/hive -e "select * from empManager.emp" >> /opt/datas/hive.result

-f：将要执行的SQL语句放入文件中，执行该文件，会依次执行每一条SQL

 sudo vim /opt/datas/hive.sql
 select * from empmanager.emp;
 select count(1) from empmanager.emp;
 bin/hive -f /opt/datas/hive.sql

注意：如果不使用db.tb的方式，默认连接操作的是default数据库

create table master.user
(
    id bigint auto_increment comment '主键'
        primary key,
    age int null comment '年龄',
    password varchar(32) null comment '密码',
    sex int null comment '性别',
    username varchar(32) null comment '用户名'
)
engine=MyISAM collate=utf8mb4_bin
;

 sudo vim /opt/datas/hive.sql
 select * from  emp;
 select count(1) from emp;
 bin/hive --database empmanager : 手动指定连接的数据库
 bin/hive --database empmanager -f /opt/datas/hive.sql

黑冰vip

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
3
评论
Day14_20180426_Hive元数据配置及开发使用

一、回顾 -》hive的功能 -》将SQL转换成了MapReduce程序提交给yarn去运行 -》将HDFS上的文件映射成了表 -》hive的组件 -》Hadoop： -》存储：hdfs -》计算：MapReduce -》metastore：整个数据库中所有的scheme信息 -》表与hdfs的映射 -》文件与表的映射 ...
复制链接

扫一扫

专栏目录