HIVE--入门小结

最新推荐文章于 2022-05-25 16:12:13 发布

u:boom

最新推荐文章于 2022-05-25 16:12:13 发布

阅读量194

点赞数

分类专栏： HIVE SQL 文章标签： hive mysql 大数据数据库

本文链接：https://blog.csdn.net/newas3/article/details/105834068

版权

HIVE 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

SQL

4 篇文章 0 订阅

订阅专栏

Hive是基于Hadoop的一个数据仓库，可以将结构化的数据文件映射为一张表，并提供类sql查询功能，Hive底层将sql语句转化为mapreduce任务运行。相对于用java代码编写mapreduce来说，Hive的优势明显：快速开发，人员成本低，可扩展性（自由扩展集群规模），延展性（支持自定义函数）。
Hive的构架：
在这里插入图片描述
Hive提供了三种用户接口：CLI、HWI和客户端。客户端是使用JDBC驱动通过thrift，远程操作Hive。HWI即提供Web界面远程访问Hive。但是最常见的使用方式还是使用CLI方式。（在linux终端操作Hive）
Hive有三种安装方式：
1、内嵌模式（元数据保村在内嵌的derby种，允许一个会话链接，尝试多个会话链接时会报错，不适合开发环境）
2、本地模式（本地安装mysql 替代derby存储元数据）
3、远程模式（远程安装mysql 替代derby存储元数据）
安装Hive：（本地模式）
首先Hive的安装是在Hadoop集群正常安装的基础上，并且集群启动
安装Hive之前我们要先安装mysql，
查看是否安装过mysql：rpm -qa|grep mysql*
查看有没有安装包：yum list mysql*
安装mysql客户端：yum install -y mysql
安装服务器端：yum install -y mysql-server
yum install -y mysql-devel
启动数据库 service mysqld start或者/etc/init.d/mysqld start
创建hadoop用户并赋予权限：

  mysql>grant all on *.* to hadoop@'%' identified by 'hadoop';
  mysql>grant all on *.* to hadoop@'localhost' identified by 'hadoop';
  mysql>grant all on *.* to hadoop@'master' identified by 'hadoop';
  mysql>flush privileges;

然后在Hive官网上下载需要的版本，hive.apache.org archive.apache.org
解压：tar -zxvf apache-hive-1.2.1-bin.tar.gz
配置：

cd /apache-hive-1.2.1-bin/conf/  vim hive-site.xml
    <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hive.metastore.local</name>
        <value>true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
         <value>jdbc:mysql://master:3306/hive?characterEncoding=UTF-8</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hadoop</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hadoop</value>
    </property>
</configuration>

复制依赖包：cp mysql-connector-java-5.1.43-bin.jar apache-hive-1.2.1-bin/lib/
配置环境变量：

export HIVE_HOME=$PWD/apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin

启动hive：hive
hive中可以运行shell命令:! shell命令
在这里插入图片描述

hive中可以运行hadoop命令：
hive中的数据类型：
原子数据类型：TINYINT SMALLINT INT BIGINT FLOAT DOUBLE BOOLEAN STRING
复杂数据类型：STRUCT MAP ARRAY
hive的使用：
建表语句：
DDL：
创建内部表：

create table mytable(
id int, 
name string) 
row format delimited fields terminated by '\t' stored as textfile;
常见

外部表：关键字 external

create external table mytable2(
	id int, 
	name string)
row format delimited fields terminated by '\t' location '/user/hive/warehouse/mytable2';

创建分区表：分区字段要写在partiton by（）

create table mytable3(
	id int, 
	name string)
partitioned by(sex string) row format delimited fields terminated by '\t'stored as textfile;

静态分区插入数据

load data local inpath '/root/hivedata/boy.txt' overwrite into table mytable3 partition(sex='boy');

增加分区：

alter table mytable3 add partition (sex='unknown') location '/user/hive/warehouse/mytable3/sex=unknown';

删除分区：alter table mytable3 drop if exists partition(sex='unknown');
分区表默认为静态分区，可转换为自动套分区
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

给分区表灌入数据：

insert into table mytable3 partition (sex) select id,name,'boy' from student_mdf;

查询表分区：show partitions mytable3;
查询分区表数据：select * from mytable3;
查询表结构：desc mytable3;
DML:
重命名表：alter table student rename to student_mdf
增加列：alter table student_mdf add columns (sex string);
修改列名：alter table student_mdf change sex gender string;
替换列结构：alter table student_mdf replace columns (id string, name string);
装载数据：（本地数据）load data local inpath ‘/home/lym/zs.txt’ overwrite into student_mdf;
（HDFS数据）load data inpath ‘/zs.txt’ into table student_mdf;
插入一条数据：insert into table student_mdf values(‘1’,‘zhangsan’);
创建表接收查询结果：create table mytable5 as select id, name from mytable3;
导出数据：（导出到本地）insert overwrite local directory ‘/root/hivedata/mytable5.txt’ select * from mytable5;
（导出到HDFS）
insert overwrite directory ‘hdfs://master:9000/user/hive/warehouse/mytable5_load’ select * from mytable5;
数据查询：
select * from mytable3; 查询全表
select uid,uname from student; 查询学生表中的学生姓名与学号字段
select uname,count(*) from student group by uname; 统计学生表中每个名字的个数
常用的功能还有 having、order by、sort by、distribute by、cluster by；等等
关联查询中有
内连接：将符合两边连接条件的数据查询出来
select * from t_a a inner join t_b b on a.id=b.id;
左外连接：以左表数据为匹配标准，右边若匹配不上则数据显示null
select * from t_a a left join t_b b on a.id=b.id;
右外连接：与左外连接相反
select * from t_a a right join t_b b on a.id=b.id;
左半连接：左半连接会返回左边表的记录，前提是其记录对于右边表满足on语句中的判定条件。
select * from t_a a left semi join t_b b on a.id=b.id;
全连接(full outer join)：
select * from t_a a full join t_b b on a.id=b.id;
in/exists关键字(1.2.1之后新特性)：效果等同于left semi join
select * from t_a a where a.id in (select id from t_b);
select * from t_a a where exists (select * from t_b b where a.id = b.id);