hive基本操作

最新推荐文章于 2024-04-24 21:10:15 发布

键盘 | 书生

最新推荐文章于 2024-04-24 21:10:15 发布

阅读量198

点赞数

分类专栏： hive 文章标签：大数据 hive 命令基本操作

本文链接：https://blog.csdn.net/weixin_43976998/article/details/90233035

版权

hive 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

增

创建库

create database if not exists database_name
comment database_comment
location hdfs_path
with dbproperties(property_name=property_value, ...);

创建表

直接创建

set hive.enforce.bucketing=true; // 创建外部表时添加
create external table table_name(
col_name data_type comment col_comment,...)
comment table_comment
partitioned by [range] (partition_name type comment partition_comment,...)  [(partition_key1 type comment partition_comment,...)  (partittion partition_name values less than (value11,value21,...), ..., (partittion partition_name values less than (value12,maxvalue,...))] // 指定[range范围]分区字段
clustered by (col_name,...) // 指定分桶字段
sorted by(col_name asc/desc,...) 指定排序字段（不常用）
into num buckets; // 指定文件分桶数
row format delimited
	fields teminated by ‘,’ // 指定每行中字段分隔符
	lines teminated by ‘\n’ // 指定行分隔符
	collection items teminated by ‘,’ // 指定map、array、struct中元素之间的分隔符
	map keys teminated by ‘:’ // 指定数据中Map类型的Key与Value之间的分隔符
stored as
	textfile // 文本，默认值
	sequencefile // 二进制序列文件
	rcfile // 列式存储格式文件
	orc // 每250MB数据按行划分一个stripe,,每个stripe内部按列式存储，比RCFILE有更高的压缩比和读写效率（常用）
	parquet // 以二进制方式存储，不可以直接读取，和orc存储类似，行列结合
location hdfs_path // 指定表hdfs路径，使用已有数据
[tblproperties ('<property_name>'='<property_value>', ...)]; // 预定义信息表属性信息

从查询结果创建表
- create table table_name1 as select * from table_name;
复制表结构
- create table table_name1 like table_name;

分区
- alter table table_name add partition(month="") partition(month="") …;
列
- alter table table_name add columns(col_name data_type [comment col_comment], …)
表尾部追加数据
- insert into table table_name partition(month="")values("","")

删

- 表
	- drop table  if exists table_name;
- 库
	- drop database  if exists db_name cascade;
- 表
	- 清空内部表数据
		- truncate table table_name;
- 分区
	- alter table if exists table_name drop  partition(month=""),partition(month=""),...；

改

库
- 属性信息
  - alter database db_name set dbproperties(’’=’’);
  - 数据库的其他元数据信息都是不可更改的
- 切换当前数据库
  - use dbname;
表
- 属性信息
  - alter table table_name set tblproperties(‘EXTERNAL’=‘FALSE’);
  - 括号内配置信息必须为大写,单引号
- 重命名
  - alter table table_name rename to new_table_name
- 列（不建议，有数据的表，重建并导入数据）
  - alter table table_name change column col_old_name col_new_name column_type comment col_comment
- 替换所有列
  - alter table table_name replace columns(col_name data_type [comment col_comment], …)

查

库
- show databases like ‘*’;
- 结构：desc database extended db_name;
表
- show tables like ‘*’;
- 结构：desc extended table_name;
- 详细：desc formatted table_name;
- 大小：
  - desc formatted table_name; // 获取file_path
  - dfs -du -s -h file_path; // 查询表大小
函数
- show functions;
- 用法：desc function fun_name;
- 详细：desc function extended fun_name;
数据
- https://mp.csdn.net/mdeditor/90274116#
数据文件
- hadoop fs -ls /user/hive/warehouse
执行计划
- explain ( extended | dependcy | authorization ) select …
所有历史命令
- cat /root/.hivehistory
hdfs文件系统
- dfs -ls /user/hive/warehouse;
本地文件系统
- ! ls /;
所有的配置信息
- set;
- 查看参数： set mapred.reduce.tasks;
- 设定参数：set mapred.reduce.tasks=100;
- 注意：仅对本次 hive 启动有效

其他

导入数据
- 导入分区中
  - load data (local) inpath ‘/path/file’ (overwrite) into table table_name partition(month=“201905”,day=“22”)
  - 创建并上传文件到hdfs的对应文件夹，如果有分区，需执行修复命令（msck repair table table_name）以添加分区对应元数据
- 导入分区并分桶（分桶表）
  - set hive.enforce.bucketing=true;
  - set mapreduce.job.reduces=-1;
  - insert (overwrite) into table buc_table select * from table_name;
- 从查询结果导入
  - 到单个表
    - insert (overwrite) into table table_name1 select * from table_name;
  - 到多个表
```
from student
  insert overwrite table table_name partition(month='201707')
  select id where month=''
  insert overwrite table table_name1 partition(month='201706')
  select id where month='';
```
- 从hive export方式导出结果数据导入
  - import table table_name partition(month=’’) from ‘/path/file’;
  - 表不存在会自动创建
导出数据
- insert (overwrite) (local) directory ‘/path’ (row format delimited teminated by ‘\t’) select userid from table_name;
- export table table_name to ‘/path’
- ]# hive -e “select * from table…” > /path/file
- ]# hive -f create_table.sql > /path/file
命令行
- 执行hql文件
  - ]# hive -f create_table.sql
- 执行hql
  - ]# hive -e “select * from table…”
- 启动参数设置
  - hive -hiveconf mapred.reduce.tasks=10;
  - 注意：仅对本次 hive 启动有效
连接hive
- CLI连接
  - hive
- HiveServer2/beeline
  - ./bin/hiveserver2
  - ./bin/beeline -u jdbc:hive2://ip:10000 -n root
退出hive
- quit