Hive基础2笔记

最新推荐文章于 2019-08-22 10:35:05 发布

dou_dou_shuai

最新推荐文章于 2019-08-22 10:35:05 发布

阅读量538

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/dou_dou_shuai/article/details/51443723

版权

hive 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

=================================================================================
复习
1、Hive是什么？
Hive是一个类SQL能够操作hdfs数据的数据仓库基础框架
Hive是一个SQL的解析引擎，能够将HQL翻译转化为MR在hadoop中执行
2、Hive系统架构
用户结构
Shell CLI
Java API
Web GUI
metastore
Hive元数据，主要保存的是Hive数据库的信息，表的信息
driver，驱动，主要做的就是hql的翻译、编译、调度、执行
3、Hive的数据结构
Hive的数据存储是基于hadoop hdfs的
本身没有存储结构
Hive有自己数据结构：数据库、表、视图、索引
4、Hive的元数据
metastore，默认内置的存储引擎是derby，也可以支持mysql，
derby他每次只能支持一个回话，所以通常使用mysql作为外置的存储引擎
5、hive的安装
linux下mysql的安装
hive的安装
6、hwi的访问
制作war包
修改配置文件
访问
=================================================================================
Hive的日志信息相关
1、如何去掉hive启动时候的日志信息
启动时发现
SLF4J: Found binding in [jar:file:/usr/local/hive-0.14.0/lib/hive-jdbc-0.14.0-standalone.jar
解决方案：
hive> !mv /usr/local/hive-0.14.0/lib/hive-jdbc-0.14.0-standalone.jar /usr/local/hive-0.14.0/lib/hive-jdbc-0.14.0-standalone.jar.bak;
2、hive的日志
1°、备份日志文件
[root@hive conf]$ cp hive-exec-log4j.properties.template hive-exec-log4j.properties
[root@hive conf]$ cp hive-log4j.properties.template hive-log4j.properties
2°、查看日志配置文件
hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=${java.io.tmpdir}/${user.name}
hive.log.file=hive.log
通过SystemInfo.java可以知道
${java.io.tmpdir}=/tmp
${user.name}=root
=================================================================================
Hive的数据类型
int、boolean、date、array、map、struct等等。
Hive的数据库、表，及其数据库、表与hdfs、metastore中的对应信息
1、Hive数据库，DDL
1°、查看数据库列表
hive> show databases;
2°、使用db
hive> use dbName;
eg.
hive> use default;
3°、创建db
hive> create database dbName;
eg.
hive> create database mydb1;
4°、删除
hive> drop database dbName;
eg.
hive> drop database mydb1;
5°、数据库在hdfs上面的位置
默认数据库在hdfs上面的位置
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
也可以通过hive> set hive.metastore.warehouse.dir;来获取
在metastore中查看
有在表DBS中可以看到default的hdfs_uri：hdfs://hive.teach.crxy.cn:9000/user/hive/warehouse
普通的数据库放在了/user/hive/warehouse下面，在metastore中查看==》
我们创建的mydb1的hdfs_uri：hdfs://hive.teach.crxy.cn:9000/user/hive/warehouse/mydb1.db
创建制定存储位置的数据库
hive> create database mydb2 location hdfs_uri;
===》
我们删除，修改hive数据库的定义的时候，对应的hdfs相应的目录和metastore中的相应的列就发生了变化，是同步的。
2、hive中的表
设置查看表的头信息
hive> set hive.cli.print.header;
hive.cli.print.header=false
hive> set hive.cli.print.header=true;
1°、表的DDL
查看表
hive> show tables;
创建表
hive> create table t1(id int);
查看表结构
hive> desc [extended] t1;
extended是可选的，是扩展的表的信息
查看表的创建语句
hive> show create table t1;
LOCATION
'hdfs://hive.teach.crxy.cn:9000/user/hive/warehouse/t1'
说了创建的表在hdfs中的位置
在metastore中查看
在表TBLS中有创建的表的信息
删除表
hive> drop table t1;
重命名表的名称
hive> alter table t1 rename to t1_ddl;
修改表中的某一列
hive> alter table t1_ddl change id t_id int;
增加列
mysql:alter table add column colname coltype;
hive> alter table add columns (colname coltype...);可以增加多列
hive> alter table t1_ddl add columns(name string comment 'name', age int);
替换整个表列
hive> alter table t1_ddl replace columns(name string comment 'name', age int);
移动某一列的位置
将某一列放在最前面
hive> alter table t1_ddl add columns(id int);（增加原有的数据）
hive> alter table t1_ddl change id id int first;
将某一列移动到另外一列的后面或前面
hive> alter table t1_ddl change age age int after id;（将age放在id的后面或name的前面）
2°、hive加载数据
一）、使用hive命令load data
hive> alter table t1_ddl replace columns(id int);
hive> alter table t1_ddl rename to t1;
hive> load data local inpath 'data/t1' into table t1;
查看表中的数据
hive> select * from t1;
二）、使用hadoop fs命令
把数据直接放到hdfs中hive对应的目录下面，hive表会不会感知到呢？
hive> dfs -put data/t1 /user/hive/warehouse/t1/t1_1;
这样hive也是可以感知到加载上来的数据的。
3°、数据加载的模式及其hive表列的分隔符
create table t2(
id int comment "ID",
name string comment "name",
birthday date comment 'birthday',
online boolean comment "is online"
);
load data local inpath 'data/t2' into table t2;
Hive有默认的行列分隔符
行分隔符和linux下面的行分隔符一直都是'\n'
列分隔符是八进制的\001,是不可见的ASCII，怎么输入呢ctrl+v ctrl+a
创建表的时候如何制定行列的分隔符呢？
create table t2_1(
id int comment "ID",
name string comment "name",
birthday date comment 'birthday',
online boolean comment "is online"
) comment "test table's seperator"
row format delimited
fields terminated by '\t'
lines terminated by '\n';
load data local inpath 'data/t2_1' into table t2;
====》有问题，读取错误数据为NULL
====》两种数据的加载模式
读模式
数据库加载数据的时候不进行数据的合法性校验，在查询数据的时候将不合法的数据显示为NULL，
好处：加载速度快，适合大数据的加载。
写模式
数据库加载数据的时候要进行数据的合法性校验，在数据库里面的数据都是合法的
好处：适合进行查询，不会担心有不合法的数据存在。
我们的Hive采用的是读模式，加载数据的时候不进行数据的合法性校验，在查询数据的时候将不合法的数据显示为NULL。

=================================================================================

hive复合数据类型
array、map、struct
array
有一群学生，id，name，hobby(多个)
create table t3_arr(
id int,
name string,
hobyy array<string>
) row format delimited
fields terminated by '\t';
hive> load data local inpath 'data/t3' into table t3_arr;
查看数组中的某一个元素
hive> select id, hobyy[1] from t3_arr;
设置数组中的分隔符
它的默认分隔符为\002，如何输入ctrl+v ctrl+b
create table t3_arr_1(
id int,
name string,
hobyy array<string>
) row format delimited
fields terminated by '\t'
collection items terminted by ',';
hive> load data local inpath 'data/t3' into table t3_arr_1;
hive> select id, name, hobby[0],hobby[1] from t3_arr_1;
map
有一群学生，id，name，scores(chinese,math,english)
create table t4_map (
id int,
name string,
scores map<string, int>
) row format delimited
fields terminated by '\t'
collection items terminated by ','
map keys terminated by ':';
load data local inpath 'data/t4_map' into table t4_map;
查询map中的特定的值
hive> select id, name, scores["chinese"] from t4_map;
map也是有默认的分隔符的，它的分隔符是\003,对应的输入为ctrl+v ctrl+c
struct
有点想咱们的java中的对象，c语言中的struct，可以容纳多种数据类的数据
create table t5_struct(
id int,
name string,
info struct<addr:string,subway:int>
) row format delimited
fields terminated by '\t'
collection items terminated by ',';
查看其中的struct中某一列数据
hive> select id, name, info.subway from t5_struct;
综合案例（全用默认的分隔符）：
超人学院有许多员工，学员都有工资、级别、住址等等
create tablet t5_employee (
id int,
name string,
salary float,
subordinate array<string>,
tax map<string, float>,
jiguan struct<province:string, city:string, zip:int>
);
注意：起别名的时候不用使用'' “”

总结：
hive中的各种默认分隔符
行分隔符：\n
列分隔符：\001 --> ctrl+v ctrl+a
集合元素之间的分隔符:\002 --> ctrl+v ctrl+b
map key-value之间的分隔符是：\003 --> ctrl+v ctrl+c
自定义分隔符
row format delimited
列分隔符：fields terminated by ...
集合元素分割 collection items terminated by ...
map：key-value分割 map keys terminated by ...
行分割 lines terminated by ...(一般不写)