hive常用操作

最新推荐文章于 2021-12-06 23:44:38 发布

ZcDong1992

最新推荐文章于 2021-12-06 23:44:38 发布

阅读量170

点赞数

文章标签： hive hadoop

本文链接：https://blog.csdn.net/m0_38132068/article/details/109617167

版权

------------All About Hive----------------
关闭
可以通过ps -ef|grep hive 来看hive 的端口号，然后kill 掉相关的进程。
启动
nohup hive --service metastore >> ~/metastore.log 2>&1 &
nohup hive --service hiveserver2 >> ~/hiveserver2.log 2>&1 & 不用这个，用spark的thriftserver

vim hadoop-env.sh # 将HADOOP_CLIENT_OPTS=-Xmx12g 即可

ps aux|grep -v grep | grep --color “HiveServer2” | grep --color Xmx

/opt/hive/apache-hive-3.1.2-bin/lib/ lib运行依赖的jar包

1.进入hive
hive 回车
beeline -u jdbc:hive2://10.170.1.72:10000 -n root
http://10.170.1.8:8686/hue root/root

2.操作读时模式（加载时不校验）
show databases;
use databaseName;
create table…
insert into …
select …

show tables; 查看当前库中的所有的表
show create table tbName; 查看某一个表的具体的建表语句（获取当前表设置的分隔符信息）

show tables in dbName; 指定查看某一个数据库中的所有的表
show tables like ‘stu*’ ; 模糊查询多个表

desc tbName; 查看表中的具体的字段信息
desc extended tbName ; 查看表的详情（查看外部表）
desc formatted tbName ; 查看表的详情（查看内部表和外部表）可以看location 表行数，大小

3.退出
ctrl+c
exit;
quit;

hive的内部表（管理表）与外部表（external）：
1.内部表数据由hive自身管理，外部表数据由HDFS管理。
2.内部表数据存储的位置是hive.metastore.warehouse.dir(默认：/user/hive/warehouse),外部表数据的存储位置由自己制定。
3.删除内部表会直接删除元数据（metadata）及存储数据；删除外部表仅仅会删除元数据，HDFS上的文件并不会被删除。
4.对内部表的修改会将修改直接同步给元数据，而对外部表的表结构和分区进行修改，则需要修复。

4.1. 删除内部表(managed table)
drop table tbName;
truncate table tbName;

注意：更换mysql驱动包的版本

4.2. 删除外部表(external table)
方式一：
alter table tbName set tblproperties(‘external’=‘false’);
drop table tbName;

方式二：
    hadoop fs -rm -r -f /user/hive/warehouse/tbName

5.完整版的建表语句：
create [external] table [if not exists] tbName
(col definiton [comment 字段的描述信息] … … …)
[comment 表的描述信息]

[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] # 创建分区
[CLUSTERED BY (col_name, col_name, ...)  ]                       # 创建分桶表
[row format ... ... ...] # 指定表的分隔符
[location path] # 指定表的存储位置

[AS select_statement]               # 查询建表法 
LIKE existing_table_or_view_name    # like建表法

create external table if not exists mysql2hive(
id int,
username string,
sex string,
age int
)row format delimited
fields terminated by “,”
lines terminated by “\n”
location ‘/data1/test/dzc’;

create table mysql2hive2(
id int,
username string
)PARTITIONED BY (
dt string)
row format delimited
fields terminated by “,”
lines terminated by “\n”
location ‘/data1/test’;

行内分隔符默认是 control-a
行间默认 \n

6.1 数据导入
1）load data [local] inpath 数据源路径 into table tbName [overwrite] [partition(k=v)]

2）insert into table tbName [partition(k=v)] values(v,v,v,v) 可能重复
insert into table mysql2hive2 values (‘33’,‘ddd’,‘t41’) 等价于
insert into table mysql2hive2 partition (dt=‘t41’) values (‘33’,‘ddd’)
insert into table mysql2hive2 select

创建表时，直接导入HDFS上数据， location 指定数据表加载数据的路径

6.2 数据导出
将数据仓库中的数据表的数据导出到HDFS export table tbName to ‘hdfs path’
将数据仓库中的数据表的数据导出到本地 dfs -get hiveDataPath localPath
insert overwrite 命令覆盖原来的全部数据 insert overwrite [local] directory path selectExpr
insert overwrite table mysql2hive2 partition (dt=‘t41’) values (‘33’,‘ddd’)

6.分区操作
alter table mysql2hive2 add if not exists partition (dt=‘test’);
alter table mysql2hive2 drop if exists partition (dt=‘test’);

7.不进入hive，执行sql语句
hive -e select * from teacher;
-e 直接在命令行编写sql语句，执行sql语句，不需要进入hive中

hive -f 文件名称
-f 直接在文件中编写sql语句，执行sql语句，不需要进入hive中

hive -f 文件名称 > 结果数据的存储路径
将hive命令的结果数据，指定存储到一个文件中

8.在hive查看hdfs文件系统
dfs -ls /;

9.查看hive操作的历史记录
cat .hivehistory

配置文件的修改（优先级）
hadoop-site.xml ----> 默认的配置hive-default.xml —> 用户编写hive-site.xml配置文件 --> hive -hiveconf —> SET命令
优先级：从小到大的顺序
!!!set hive.fetch.task.conversion=more; //不用执行MapReduce
set mapred.max.split.size=256000000; //一个节点上split的最大的大小(这个值决定了多个DataNode上的文件是否需要合并)
set mapred.min.split.size.per.node=100000000;//一个节点上split的至少的大小(这个值决定了多个DataNode上的文件是否需要合并)
set mapred.min.split.size.per.rack=100000000;//一个交换机下split的至少的大小(这个值决定了多个交换机上的文件是否需要合并)
!!!set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;//执行Map前进行小文件合并,解决小文件过多的问题
set mapreduce.job.reduces=10； # 一个job中的reduce的个数决定的输出文件的个数，参考当前环境下的数据量和数据分布

11.hive 元数据存储到MySQL（初始化）
schematool -dbType mysql -initSchema

select * from hive_pm; 项目数据

ZcDong1992

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive常用操作

------------All About Hive----------------关闭可以通过ps -ef|grep hive 来看hive 的端口号，然后kill 掉相关的进程。启动nohup hive --service metastore >> ~/metastore.log 2>&1 &nohup hive --service hiveserver2 >> ~/hiveserver2.log 2>&1 & 不用这个，
复制链接

扫一扫