Hive基础语法笔记

最新推荐文章于 2022-02-19 17:57:23 发布

一大溜英文字母

最新推荐文章于 2022-02-19 17:57:23 发布

阅读量169

点赞数 2

文章标签：数据仓库

本文链接：https://blog.csdn.net/weixin_40003829/article/details/103604172

版权

hive它是数据仓库。
数据库和数据仓库区别：

HQL --> Hive sql

hive默认的计算框架是mapreduce，但是可以修改计算框架。比如可以改成spark

启动hive方式：l

直接输入 hive 即可进入 hive client
退出方式： quit;
2.beeline
01.启动hiveserver2服务
02.!connect jdbc:hive2://jh001:10000
03.然后输入用户名和密码
```
 退出方式：!q
```

hive安装，用到了mysql。
1.mysql创建了一个数据库 metastore
2.在hive的conf中创建了一个hive-site.xml 指定了mysql的数据库和mysql的用户名和密码

为什么要用mysql：
其实hive中有一个数据库，它的名字叫 derby，轻便，小巧。作用：管理hive的元数据
它有一个缺点：它只能一个客户端连接

咱们就把它改成mysql

hive的操作：
DDL：对表或数据库的操作
创建数据库
当创建一个数据库时，从hdfs上查看，它是一个文件夹(/user/hive/warehourse/数据库.db)

创建表
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment],注释 …)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], …)] 分区表
[CLUSTERED BY (col_name, col_name, …) 分桶表
[SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS] 排序
[ROW FORMAT row_format] 指定字段分割符，指定行之间的分隔符
[STORED AS file_format] 指定文件的类型 textfile rc 压缩格式，等等
[LOCATION hdfs_path]指定数据的路径

hive 默认的字段分隔符：\001 ♦ 默认行分隔符就是回车 \n

hive中的表分为两种：
内部表也叫管理表：
外部表 external
区别：
1.创建方式不一样
2.删除表后的结果不一样。内部表删除表时，真实数据会被删除。外部表在删除表时，真实数据会被保留
3.一般加载的数据都是放到外部表，分析后的结果数据一般放到内部表（管理表）

查看表的详情信息：
desc tablename;
desc formatted tablename;

增加/减少字段

DML:对数据的操作
insert into 插入数据
load data 加载数据
load data local inpath “/opt/rh/stu.txt” into table stu;

关于数据库的语句：
查看数据库
创建数据库其实可以指定路径 create database abc location “/aaa” hdfs的路径
使用数据库
数据库详细信息
过滤数据库

表的语句：
创建内部表，外部表，并指定字段分隔符
删除表
创建相同结构的表 like
查看表的详情信息两种方式
加载数据 load
Hive介绍
数据库和数据仓库区别
Hive的架构图 --> driver 驱动器
Hive为什么不用自带的derby数据库

关于数据库的语句：
查看数据库 show datbases;
创建数据库 create database databaseName location ‘hdfs path’;其实可以指定路径 create database abc location “/aaa” hdfs的路径
使用数据库 use databaseName;
数据库详细信息 desc database databaseName;
过滤数据库 show databases like ‘a*’;

表的语句：
创建内部表，外部表，并指定字段分隔符

Hive元数据那个地方：咱们放到了mysql
Hive真实数据那个地方：hdfs

管理表，内部表： \001
create table tablename(col data_type…) row format delimited fields terminated by “\t” stored as textfile rcfile seq…file;

外部表：
create external table tablename(col data_type…) row format delimited fields terminated by “\t” stored as textfile rcfile seq…file;

删除表
drop table tablename;

创建相同结构的表 like
create table tablename2 like tablename;

查看表的详情信息两种方式
desc tablename;
desc formatted tablename;

加载数据 load
load data local inpath ‘path’ into table tablename;

Hive特有的东西，分区表：
这里要重点强调，所谓分区，这是将满足某些条件的记录打包，做个记号，在查询时提高效率，
相当于按文件夹对文件进行分类，文件夹名可类比分区字段。这个分区字段形式上存在于数据表中，
在查询时会显示到客户端上，但并不真正在存储在数据表文件中，是所谓伪列。
select * from tablename;

分区表在创建表的时候就已经确定了：
分区创建语句：分区字段不能再表的列中,分区字段永远在最后一位。
create table tablename(col data_type…) partitioned by (beginTime string) row format delimited fields terminated by “\t”;

create table stu_par(id int,name string,age int)
partitioned by(class string)
row format delimited fields terminated by " ";

查看表分区：
show partitions tablename;

create table stu_2par(id int,name string,age int) partitioned by(province string,town string) row format delimited fields terminated by “,”;

alter table stu_2par
add partition(province=“HeNan”,town=“JIAOZUO”)
partition(province=“HeNan”,town=“XINXIANG”)
partition(province=“HB”,town=“SJZ”);

create table stu_2par(id int,name string,age int)
partitioned by(dataTime string,province string,town string)
row format delimited fields terminated by “,”;

Hadoop作用：
Hive作用：数据分析

删除分区：什么时候删除分区？不用分区，当服务磁盘空间不够的

服务器一共100g空间每天的分区占用1g 我只要最近一个月的数据

修复命令：

DML操作：对数据的操作
load data
load data [local] inpath ‘/opt/datas/student.txt’ [overwrite] into table student [partition (partcol1=val1,…)];

insert into/overwirte
需求：将年龄大于20岁的人写入新的表中 // 大写变小写 ctrl+u
create table stu_age20(id int,name string,age int) row format delimited fields
terminated by “\t” stored as sequencefile;

insert into table stu_age20
select sid,name,age from stu_partition where age > 20;

需求：将stu_age20的数据加载到stu_partition的新分区中。(class=2019060121)

insert into table stu_partition partition(class=“2019060121”)
select id,name,age from stu_age20;

多插入模式：
需求：将stu_partition中age大于20岁的覆盖到stu_age20，将age小于等于20岁的写到stu_age18

create table stu_age18(id int,name string,age int) row format delimited fields
terminated by “\t” stored as sequencefile;

insert overwirte table stu_age20
select * from stu_partition where age>20;
insert overwirte table stu_age18
select * from stu_partition where age<=20;

from stu_partition//只读一次
insert overwrite table stu_age20
select sid,name,age where age>20
insert overwrite table stu_age18
select sid,name,age where age<=20;

hive -e
hive -f

1.创建多级分区表
create table 表(字段数据类型,…) partitioned by(分区字段数据类型,…)row format delimited fields terminated by “分区字段”；
alert table 表 partition(分区字段数据类型,…)partition(分区字段数据类型,…)…;

2.删除分区表
alert table 表 drop partition(字段数据类型);

3.增加减少列
alert table 表 replace columns(字段数据类型,…);
alert table 表 add columns(字段数据类型,…);

4.修改表名
alert table 旧表名 rename to 新表名;

5.修复命令 msdk
msdk repair table 表;

DML：
1.数据加载本地和hdfs集群
load data (local) inpath “路径” into (overwrite) table 表 partition(分区字段数据类型,…);
有关键字local表示从本地导入，无关键字local表示从hadoop导入。表必须存在，如果表不存在则先建表再导入。

2.数据插入插入和重写，分区表必须指定分区
insert into table 表 partition(class=“分区名”) select 字段 from 表;
insert overwirte table 表 partition(class=“分区名”) select 字段 from 表;
没什么好说的，into插入新的，overwirte覆盖原有的。

3.多表插入用法
from 表 insert overwirte(into) table 表1 select…
insert overwirte(into) table 表2 select…
insert overwirte(into) table 表3 partition(…) select…

4.导入导出语句本地和hdfs集群
export table 表 partition(…) to"路径";
import table 表 partition(…) from"路径";

5.insert导入语句本地和hdfs集群
insert overwirte(into) local directory “路径” select…;
insert overwirte(into) directory “路径” select…;

Hive查询：
SELECT [ALL | DISTINCT 去重] select_expr, select_expr, …
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]

group by 分组可以多个字段分组
分组后，select后面只能跟被分组的字段或者聚合函数

order by 排序，全局排序
缺点：不管数据量多大，只有一个reduce，所以效率很低
默认升序
降序在后边加上desc
select ename,sal from emp order by sal desc;

sort by 排序，区内有序
distribute by 分区，一般结合sort by使用

cluster by 对相同字段分区且排序
默认升序，不可改变

一大溜英文字母

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hive基础语法笔记

这里写自定义目录标题欢迎使用Markdown编辑器新的改变功能快捷键合理的创建标题，有助于目录的生成如何改变文本的样式插入链接与图片如何插入一段漂亮的代码片生成一个适合你的列表创建一个表格设定内容居中、居左、居右SmartyPants创建一个自定义列表如何创建一个注脚注释也是必不可少的KaTeX数学公式新的甘特图功能，丰富你的文章UML 图表FLowchart流程图导出与导入导出导入欢迎使用Ma...
复制链接

扫一扫