2020.9.16课堂笔记(Hive数据库)

最新推荐文章于 2023-08-22 10:29:34 发布

超可爱慕之

最新推荐文章于 2023-08-22 10:29:34 发布

阅读量334

点赞数 1

分类专栏：笔记

本文链接：https://blog.csdn.net/m0_48758256/article/details/108622305

版权

笔记专栏收录该内容

138 篇文章 5 订阅

订阅专栏

hiveserver和beeline(hiveserver2)的区别

hive不需要启动服务再访问，使用hive命令就直接启动服务再访问了
beeline需要先手动启动服务端，再访问客户端。
beeline提供了一个服务专门来处理beeline相关的请求，专事专办，在查询效率上比hive高，逻辑上本身没有太大差别。
beeline不支持update和delete，但是hive都是支持的虽然比较慢。
beeline的使用方式：

首先启动hiveserver2
nohup hive --service hiveserver2 &   不打印日志 后台启动 hiveserver2  
输入 beeline回车，进入beeline命令行 
输入连接：!connect  jdbc:hive2://192.168.237.31:10000
beeline后可以接参数： [-e] [-f]
beeline -e "show databases;" -u !connect  jdbc:hive2://192.168.237.31:10000
或者编辑文件： echo "show databases;" > "hive1.sql"
beeline -f "hive1.sql" -u !connect  jdbc:hive2://192.168.237.31:10000

hive：

1.连接方式

hive -h hostname -p port

2.加载文件

暂时略

3.库级操作：同MySQL，hive对库有操作权限

create database hivetest;  //创建数据库
show databases;
drop database hivetest;
use hivetest;

4.表级操作：

看看hive有多少字段类型：
在这里插入图片描述
复杂数据类型：
Hive元数据结构：

--进入beeline命令行
beeline
!connect  jdbc:hive2://192.168.237.31:10000
--创建一个数据库
create database if not exists test1;
--查看当前所在的数据库
select current_database();
--切换数据库
use test1
--查看数据库信息
describe test1;
--改变数据库拥有者
alter database test1 set owner user root;
--删除数据库
drop database if exists test1;

【扩展内容】
Hive beeline客户端使用时，会显示很多info信息，影响视觉效果，如果需要不显示info信息，可以通过下面方式：
方式一：在使用beeline时进行以下设置即可（当前beeline会话有效）：

set hive.server2.logging.operation.level=NONE

方式二：永久设置：在hive-site.xml中加入如下配置也可以禁用在beeline中显示额外信息,设置完成之后重启hiveserver2服务生效。

<property>
    <name>hive.server2.logging.operation.enabled</name>
    <value>true</value>
</property>
<property>
    <name>hive.server2.logging.operation.level</name>
    <value>NONE</value>
</property>
<property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/opt/log/hive/operation_logs</value>
</property>

某些可能出现的问题：
Column length too big for column ‘PARAM_VALUE’ (max = 21845)
MySQL版本问题：

//这是mysql编码格式的问题，进入mysql输入这三条命令:
show variables like "char%";
use hive;
alter database hive character set latin1;

之前的建表语句都太过粗糙了，没有涉及到高级一点的用法。
默认内部表，会默认在指定的存储空间中建立对应的文件夹
只要把文件放入，表就可以读取到数据（需要和表结构匹配，否则会读取到很多null）
partition分区表，会在表下创建文件夹，定义不同的分区，数据在各分区文件夹下
先查看数据文件：employee.txt

employee.txt
Michael|Montreal,Toronto|Male,30|DB:80|Product:DeveloperLead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead

设计建表语句：

create table employee(
name string,
address array<string>,
personalInfo array<string>,
technol map<string,int>,
jobs map<string,string>)
row format delimited
fields terminated by '|'   //hive里的表是对hdfs里数据的映射，是个文本 不指定会有默认值
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

从表建完开始，row format delimited，都不需要逗号，
按住alt键框一下就能把脚本复制出来，为什么要照着数据的结构来做，涉及到导入的问题。
表建完了在hadoop上的体现：
在这里插入图片描述就是在hdfs系统/hive/warehouse目录下建了hivetest.db这个文件夹，表也是在数据库里建了一个文件夹employee，里面没有数据。
把数据上传到/hive/warehouse/hivetest.db/employee目录下：

hdfs dfs -put /root/employee.txt  /hive/warehouse/hivetest.db/employee

然后查询employee表：

hive> select * from employee;
OK
Michael ["Montreal","Toronto"]  ["Male","30"]   {"DB":80}       {"Product":"Developer\u0004Lead"}
Will    ["Montreal"]    ["Male","35"]   {"Perl":85}     {"Product":"Lead","Test":"Lead"}
Shelley ["New York"]    ["Female","27"] {"Python":80}   {"Test":"Lead","COE":"Architect"}
Lucy    ["Vancouver"]   ["Female","57"] {"Sales":89}    {"Sales":"Lead"}
Time taken: 0.349 seconds, Fetched: 4 row(s)

查看数据文件employee_id.txt

employee_id.txt
Michael|100|Montreal,Toronto|Male,30|DB:80|Product:DeveloperLead
Will|101|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Steven|102|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|103|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Mike|104|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|105|New York|Female,27|Python:80|Test:Lead,COE:Architect
Luly|106|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Lily|107|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shell|108|New York|Female,27|Python:80|Test:Lead,COE:Architect
Mich|109|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Dayong|110|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Sara|111|New York|Female,27|Python:80|Test:Lead,COE:Architect
Roman|112|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Christine|113|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Eman|114|New York|Female,27|Python:80|Test:Lead,COE:Architect
Alex|115|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Alan|116|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Andy|117|New York|Female,27|Python:80|Test:Lead,COE:Architect
Ryan|118|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Rome|119|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Lym|120|New York|Female,27|Python:80|Test:Lead,COE:Architect
Linm|121|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Dach|122|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Ilon|123|New York|Female,27|Python:80|Test:Lead,COE:Architect
Elaine|124|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead

设计建表语句：

create table employee_id(
name string,
id int,
address array<string>,
genderAndAge struct<gender:string,age:int>,
workAndSal map<string,int>,
jobAndRole map<string,string>)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

上传数据文件到目录下：/hive/warehouse/hivetest.db/employee_id

hdfs dfs -put /root/employee_id.txt /hive/warehouse/hivetest.db/employee_id

查看结果：

hive> select * from employee_id;
OK
Michael 100     ["Montreal","Toronto"]  {"gender":"Male","age":30}      {"DB":80}       {"Product":"DeveloperLead"}
Will    101     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}    {"Product":"Lead","Test":"Lead"}
Steven  102     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Lucy    103     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Mike    104     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}    {"Product":"Lead","Test":"Lead"}
Shelley 105     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Luly    106     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Lily    107     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}    {"Product":"Lead","Test":"Lead"}
Shell   108     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Mich    109     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Dayong  110     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}    {"Product":"Lead","Test":"Lead"}
Sara    111     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Roman   112     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Christine       113     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}     {"Product":"Lead","Test":"Lead"}
Eman    114     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Alex    115     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Alan    116     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}    {"Product":"Lead","Test":"Lead"}
Andy    117     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Ryan    118     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Rome    119     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}    {"Product":"Lead","Test":"Lead"}
Lym     120     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Linm    121     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Dach    122     ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}    {"Product":"Lead","Test":"Lead"}
Ilon    123     ["New York"]    {"gender":"Female","age":27}    {"Python":80}  {"Test":"Lead","COE":"Architect"}
Elaine  124     ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89,"HR":94}    {"Sales":"Lead"}
Time taken: 0.072 seconds, Fetched: 25 row(s)

查看工作是sales的员工信息

hive> select name,id,address[0],genderAndAge.gender,genderAndAge.age,workAndSal["Sales"] from employee_id where workAndSal["Sales"] is not null;
OK
Lucy    103     Vancouver       Female  57      89
Luly    106     Vancouver       Female  57      89
Mich    109     Vancouver       Female  57      89
Roman   112     Vancouver       Female  57      89
Alex    115     Vancouver       Female  57      89
Ryan    118     Vancouver       Female  57      89
Linm    121     Vancouver       Female  57      89
Elaine  124     Vancouver       Female  57      89
Time taken: 0.13 seconds, Fetched: 8 row(s)

hive> desc employee;
OK
name                    string
address                 array<string>
personalinfo            array<string>
technol                 map<string,int>
jobs                    map<string,string>

alter table employee change personalinfo info struct<gender:string,age:int>;
hive> desc employee;
OK
name                    string
address                 array<string>
info                    struct<gender:string,age:int>
technol                 map<string,int>
jobs                    map<string,string>

hive> select * from employee;
OK
Michael ["Montreal","Toronto"]  {"gender":"Male","age":30}      {"DB":80}      {"Product":"Developer\u0004Lead"}
Will    ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}     {"Product":"Lead","Test":"Lead"}
Shelley ["New York"]    {"gender":"Female","age":27}    {"Python":80}   {"Test":"Lead","COE":"Architect"}
Lucy    ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89}    {"Sales":"Lead"}
Time taken: 0.054 seconds, Fetched: 4 row(s)

//不可用as select * from employee

create table employee_partition(
name string,
address array<string>,
info struct<gender:string,age:int>,
technol map<string,int>,
jobs map<string,string>)
partitioned by (country string,add string)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

create table tmp_employee as select * from employee_partition;
hive> desc tmp_employee;
OK
name                    string
address                 array<string>
info                    struct<gender:string,age:int>
technol                 map<string,int>
jobs                    map<string,string>
country                 string
add                     string
Time taken: 0.05 seconds, Fetched: 7 row(s)

hive> desc employee_partition;
OK
name                    string
address                 array<string>
info                    struct<gender:string,age:int>
technol                 map<string,int>
jobs                    map<string,string>
country                 string
add                     string

 Partition Information
 col_name              data_type               comment

country                 string
add                     string
Time taken: 4.824 seconds, Fetched: 13 row(s)

LOAD用于在Hive中移动数据

LOAD DATA LOCAL INPATH '/home/dayongd/Downloads/employee.txt' 
OVERWRITE INTO TABLE employee;
-- 加LOCAL关键字，表示原始文件位于Linux本地，执行后为拷贝数据
LOAD DATA LOCAL INPATH '/home/dayongd/Downloads/employee.txt' 
OVERWRITE INTO TABLE employee_partitioned  PARTITION (year=2014, month=12);
-- 没有LOCAL关键字，表示文件位于HDFS文件系统中,执行后为直接移动数据
LOAD DATA INPATH '/tmp/employee.txt'  
OVERWRITE INTO TABLE employee_partitioned PARTITION (year=2017, month=12);

LOCAL：指定文件位于本地文件系统，执行后为拷贝数据
OVERWRITE：表示覆盖表中现有数据

load data local inpath '/root/employee.txt'
into table employee_partition
partition(country="China",add="LiaoNing");

hive> select * from employee_partition;
OK
Michael ["Montreal","Toronto"]  {"gender":"Male","age":30}      {"DB":80}      {"Product":"Developer\u0004Lead"}        China   LiaoNing
Will    ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}     {"Product":"Lead","Test":"Lead"}        China   LiaoNing
Shelley ["New York"]    {"gender":"Female","age":27}    {"Python":80}   {"Test":"Lead","COE":"Architect"}       China   LiaoNing
Lucy    ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89}    {"Sales":"Lead"}        China   LiaoNing
Time taken: 0.299 seconds, Fetched: 4 row(s)

load data inpath '/opt/hive/warehouse/employee/employee.txt'
into table employee_partition
partition (country="China",add="NanJing");

hive> select * from employee_partition;
OK
Michael ["Montreal","Toronto"]  {"gender":"Male","age":30}      {"DB":80}      {"Product":"Developer\u0004Lead"}        China   LiaoNing
Will    ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}     {"Product":"Lead","Test":"Lead"}        China   LiaoNing
Shelley ["New York"]    {"gender":"Female","age":27}    {"Python":80}   {"Test":"Lead","COE":"Architect"}       China   LiaoNing
Lucy    ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89}    {"Sales":"Lead"}        China   LiaoNing
Michael ["Montreal","Toronto"]  {"gender":"Male","age":30}      {"DB":80}      {"Product":"Developer\u0004Lead"}        China   NanJing
Will    ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}     {"Product":"Lead","Test":"Lead"}        China   NanJing
Shelley ["New York"]    {"gender":"Female","age":27}    {"Python":80}   {"Test":"Lead","COE":"Architect"}       China   NanJing
Lucy    ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89}    {"Sales":"Lead"}        China   NanJing
Time taken: 0.063 seconds, Fetched: 8 row(s)

查看地址为NanJing的字段：注意要区分大小写

hive> select * from employee_partition where add="NanJing";
OK
Michael ["Montreal","Toronto"]  {"gender":"Male","age":30}      {"DB":80}      {"Product":"Developer\u0004Lead"}        China   NanJing
Will    ["Montreal"]    {"gender":"Male","age":35}      {"Perl":85}     {"Product":"Lead","Test":"Lead"}        China   NanJing
Shelley ["New York"]    {"gender":"Female","age":27}    {"Python":80}   {"Test":"Lead","COE":"Architect"}       China   NanJing
Lucy    ["Vancouver"]   {"gender":"Female","age":57}    {"Sales":89}    {"Sales":"Lead"}        China   NanJing
Time taken: 0.048 seconds, Fetched: 4 row(s)

重新建一个表演示insert插入操作：

create table p_test(
pid int,
pname string)
partitioned by (person string)
row format delimited
fields terminated by ','
lines terminated by '\n';

insert into p_test partition (person='sam') values(1,'a'),(2,'b'),(3,'c');
hive> select * from p_test;
OK
1       a       sam
2       b       sam
3       c       sam
Time taken: 0.067 seconds, Fetched: 3 row(s)

创建外部表：

创建外部表需要使用external关键字，删除表的时候并不会删除源数据。
一般会指定location，因为创建外部表的场景很多时候是数据已经存放在hdfs路径了，
如果格式正确的话，创建完表，可以直接查询到数据了。
stored as textfile
location ‘/usr/test/employee’; // 这里是hdfs的路径，可以自己去定义。

拿出一条数据分析： (在hive和MySQL一样不用区分大小写)
Michael name string
Montreal,Toronto array<string>
Male,30 不是key和value用map不太合适 struct<sex:string,age:int>
DB:80 map<string,int>
Product:DeveloperEOTLead
EOT是 ^D (\004) 这样一个分割符(在Linux中使用vi编辑的时候使用 ctrl+v ctrl+a/b/c 这样的方式输入)，牵扯到数据类型的嵌套 map<string,array<string>>
Product:Lead,COE:Architect
在map中可以存很多组map，map<“zhangsan”,12> <“lisi”,20> 可以直接定义成一个map<string,string>

create external table if not exists emp_id(
name string,
id int,
address array<string>,
info struct<gender:string,age:int>,
workAndSal map<string,int>,
jobAndRole map<string,array<string>>)   //如果使用默认分割符，这里就结束了不会报错的
row format delimited        //指定分隔符（企业里可能不会用的这么多，规范的用逗号分割）
fields terminated by '|'    //字段分隔符，一般情况下是char类型的，使用单引号
collection items terminated by ','  //集合的分隔符
map keys terminated by ':'          //map的分隔符  嵌套的分割符使用的是默认的就不指定了，不可以和外面的数组使用相同的分割符
lines terminated by '\n'     //换行符
stored as textfile           //存储的格式
location '/usr/test/employee';

如果在notepad++中可以选择语言 -> SQL ，标准SQL的关键字会变色看起来更舒服一点。

把emp_id表的存储地址指定到/usr/test/employee目录下了，也就在数据库里把元数据load进去了，再上传一个文件employee_id到/usr/test/employee下，select查询也能查询到employee_id的数据，和将文件上传到内部表文件夹是一个道理。

load data local inpath '/root/employee.txt' into table emp_id;

可以使用命令查看建表信息：

-- 查看完整的建表语句信息
show create table emp_id;
-- 查看元数据信息
desc formatted emp_id;

在MySQL中也可以看到元数据信息：
DBS 可以看到数据库的信息
FUNCS 自定义的方法函数
TBLS 表信息按顺序创建的，有一个TBL_ID是 auto_increment 的

mysql> desc TBLS;
+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| TBL_ID             | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME        | int(11)      | NO   |     | NULL    |       |
| DB_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| LAST_ACCESS_TIME   | int(11)      | NO   |     | NULL    |       |
| OWNER              | varchar(767) | YES  |     | NULL    |       |
| RETENTION          | int(11)      | NO   |     | NULL    |       |
| SD_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| TBL_NAME           | varchar(128) | YES  | MUL | NULL    |       |
| TBL_TYPE           | varchar(128) | YES  |     | NULL    |       |
| VIEW_EXPANDED_TEXT | mediumtext   | YES  |     | NULL    |       |
| VIEW_ORIGINAL_TEXT | mediumtext   | YES  |     | NULL    |       |
| LINK_TARGET_ID     | bigint(20)   | YES  | MUL | NULL    |       |
+--------------------+--------------+------+-----+---------+-------+
12 rows in set (0.00 sec)

插入数据：

insert into table employee2 select * from employee_id;

分桶表：

这里使用外部表来给它分桶。
clustered by(id) into 3 buckets
要指定一下根据什么来分桶。
set hive.enforce.bucketing=true;强制分桶。

create external table emp_bucket(
name string,
id int,
address array<string>,
info struct<gender:string,age:int>,
workAndSal map<string,int>,
jobAndRole map<string,string>)
clustered by(id) into 3 buckets
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile
location '/usr/test/bucket';

文件好像没有任何区别。。。。
加上强制分桶试试：

set hive.enforce.bucketing=true;

5.行级操作（数据级操作）

关于表归属数据库的问题：
使用MySQL查看hive数据库，DBS表中可以看到每个数据库的DB_ID，在TBLS中可以看到表和数据库的关联关系。

补充：

使用DBeaver外部连接hiveserver2

在这里插入图片描述
创建新连接，配置hive的驱动：

添加文件，选择hive驱动的jar包：
测试连接通过：
注意需要开启hiveserver2服务才能连接，后台开启的方式：

hiveserver2 &

超可爱慕之

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2020.9.16课堂笔记(Hive数据库)

hiveserver和beeline(hiveserver2)的区别hive不需要启动服务再访问，使用hive命令就直接启动服务再访问了beeline需要先手动启动服务端，再访问客户端。beeline提供了一个服务专门来处理beeline相关的请求，专事专办，在查询效率上比hive高，逻辑上本身没有太大差别。beeline不支持update和delete，但是hive都是支持的虽然比较慢。hive：1.连接方式hive -h hostname -p port2.加载文件暂时略3.库级
复制链接

扫一扫