Hive环境搭建及表创建

fengchengwu2012

已于 2022-11-04 15:43:49 修改

阅读量697

点赞数

分类专栏：大数据文章标签： hive

于 2020-06-30 17:16:57 首次发布

本文链接：https://blog.csdn.net/fengchengwu2012/article/details/107046495

版权

大数据专栏收录该内容

17 篇文章 0 订阅

订阅专栏

1、安装hive

2、修改配置

（1）添加配置文件配置 HADOOP_HOME 路径


 mv hive-env.sh.template hive-env.sh

 vi hive-env.sh  
 
 保存环境变量
 export HADOOP_HOME=/home/master/hadoop-2.9.2

 export HIVE_CONF_DIR=/home/master/hive-3.1.2/conf

（2）配置hive-site.xml文件

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<-- 元数据存储在msyql的metastore_db库中 -->
  <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://server200:3306/metastore_db?useSSL=false&amp;useUnicode=true&amp;characterEncoding=utf8</value>
  </property>

    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
  </property>

  <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>root</value>
  </property>

  <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>root</value>
  </property>


  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/home/wucf/hive/warehouse</value>
  </property>

  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>

  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
  </property>

  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
  </property>

  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>server115</value>
  </property>

  <property>
    <name>hive.server2.thrift.http.port</name>
    <value>10001</value>
  </property>

   <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
  </property>

   <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
  </property>

</configuration>

（3）初始化元数据

a>在mysql数据库中创建metastore_db数据库

b>进入hive的bin目录，执行初始化命令

schematool  -initSchema   -dbType  mysql  -verbose

3、启动hive/客户端交互

（1）启动hive服务

# 连接hive元数据
hive --service metastore 
#暴露jdbc连接
hive --seervice hiveserver2

（2）启动hive客户端

#方式1：启动交互本地脚本直接连接metatastore服务(此方式官方已经不推荐)
   bin/hive
#方式2：使用jdbc方式远程连接hiveserver2转发到metatastore服务
   beeline -u jdbc:hive2://server116:10000  -n wucf

4、hive常用脚本命令

（1）建表语句

数据源文件/root/love.txt内容如下：

1;liuping;henanxinxian
2;liucui;henanlankao
3;gejiali;henanyuzhou
4;chenyingying;henanxinxian

使用分隔符映射创建t_user表，此例使用；作为分隔

create  table  t_user(id int,name string,address string)  row format delimited  fields terminated by ';';

查看创建好的表

（2）装载映射数据

将love.txt文件得内同容映射到t_user表

查询表数据：

（3）分区创建表

有数据源文件内容如下

henan.txt
1,郑州
2,洛阳

jiangsu.txt
3,南京
4,苏州
5,无锡

guangdong.txt
6,广州
7,深圳

zhengjiang.txt
8,杭州
9,宁波

将以上的城市按照省份分区建表，建表语句：

create  table  t_city(id int,city string) partitioned by (province string) row format delimited  fields terminated by ',';

数据装载映射：

LOAD  DATA  local  INPATH  '/home/master/plugin/henan.txt'  INTO  TABLE   t_city partition(province='henan');


LOAD  DATA  local  INPATH  '/home/master/plugin/jiangsu.txt'  INTO  TABLE   t_city partition(province='jiangsu');


LOAD  DATA  local  INPATH  '/home/master/plugin/guangdong.txt'  INTO  TABLE   t_city partition(province='gaungdong');


LOAD  DATA  local  INPATH  '/home/master/plugin/zhejiang.txt'  INTO  TABLE   t_city partition(province='zhejiang');

（4）双分区创建表

例如：按照省份和城市对县区数据进行分区:

zhengzhou.txt
1,金水区
2,管城区
3,二七区

luoyang.txt
3,西公区
4,洛龙区

nanjing.txt
1,江宁区
2,雨花台

执行创建表语句，以省、市作为分区条件

create  table  t_county(id int,county string) partitioned by (province string,city string) row format delimited  fields terminated by ',';

装载数据

 
LOAD  DATA  local  INPATH  '/home/master/plugin/city/zhengzhou.txt'  INTO  TABLE   t_county partition(province='henan',city='zhengzhou');

LOAD  DATA  local  INPATH  '/home/master/plugin/city/luoyang.txt'  INTO  TABLE   t_county partition(province='henan',city='luoyang');

LOAD  DATA  local  INPATH  '/home/master/plugin/city/nanjing.txt'  INTO  TABLE   t_county partition(province='jiangsu',city='nanjing');

查询数据：

(5)分桶创建表

设置开启分桶功能，开启分桶数据

#开启分桶
set  hive.enforce.bucketing=true;
#分成4个桶
set  mapreduce.job.reduces=2;

数据源：

1,郑州,90
2,洛阳,80
3,开封,85
4,信阳,89
5,南阳,88
6,焦作,86
7,新乡,87
8,安阳,84
9,鹤壁,81
10,商丘,96
11,平顶山,92
12,许昌,79
13,周口,78
14,驻马店,77
15,三门峡,76
16,漯河,75
17,濮阳,74

先创建临时表t_temp_cities,将数据源装载到表中

create table t_temp_cities(id int,city string,sorce int) row format delimited fields terminated by ',';
LOAD DATA local INPATH '/home/master/plugin/city/cities.txt' INTO TABLE t_temp_cities；

分桶创建语句

create table  t_cities(id int,city string,sorce int) clustered by (id) into  4  buckets  row format delimited  fields terminated by ',';

分桶装载数据,使用桶表将临时表数据插入到t_cites中

insert  overwrite  table  t_cities   select  * from  t_temp_cities cluster by(id);

(6)复杂数据建表映射

数据源

  1,河南,郑州;洛阳;开封
  2,江苏,南京;苏州;无锡

以逗号分隔字段，以分号分割集合,建表语句


create  table  t_compe(id int,pro string,list  array<string>)  row format delimited  fields terminated by ',' collection items terminated by ';';

装载数据
hadoop fs -put /home/master/plugin/city/complex.txt /user/hive/warehouse/t_compe;

查询数据

fengchengwu2012

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive环境搭建及表创建

hive数据仓库搭建工具
复制链接

扫一扫

专栏目录