hive学习之启动方式，常用命令及数据类型

最新推荐文章于 2024-06-19 09:32:36 发布

倔强的耗子

最新推荐文章于 2024-06-19 09:32:36 发布

阅读量1.2k

点赞数

文章标签： hive hadoop big data

本文链接：https://blog.csdn.net/weixin_44178366/article/details/120576931

版权

hive启动方式，常用命令及数据类型

启动hive

启动方式有两种，一种时普通的客户端，另一种是jdbc协议的客户端

普通客户端

# 前提启动hadoop集群
[atguigu@hadoop102 bin]$ hive
xxx
hive> show databases;
OK
default
Time taken: 0.844 seconds, Fetched: 1 row(s)

jdbc协议的客户端

beeline --------->hiveserver2--------->hive

启动hiveserver2的服务，这个服务连的jdbc的协议，通过beeline间接操作hive

# 启动hiveserver2服务
hive --service hiveserver2

# 启动后是一个阻塞进程，然后另起一个窗口，通过beeline的方式启动hive
beeline -u jdbc:hive2://hadoop102:10000 -n atguigu

在这里插入图片描述

使用元数据服务的方式访问

这个服务是可选的,这是操作元数据的，元数据存在mysql中，这个服务主要是提供操作mysql中元数据的方法，由客户端通过该服务从而实现对元数据的操作

在hive-site.xml中添加如下配置信息

 <!-- 指定存储元数据要连接的地址 -->
<property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop102:9083</value>
</property>

 <!-- 如果说你在配置文件配置这个，那么你将来必须要启动元数据服务，不然会报错 -->

启动metastore

[atguigu@hadoop102 hive]$ hive --service metastore
2020-04-24 16:58:08: Starting Hive Metastore Server
注意: 启动后窗口不能再操作，需打开一个新的shell窗口做别的操作

在另一个窗口使用hive命令，这个时候使用hive命令就ok了，也就配置好了metastore服务。

现在的问题，由于配置了metastore服务，当使用beeline的方式连接hive,就会启动两个阻塞命令，

hive --service metastore

hive --service hiveserver2,很麻烦，那么这个问题怎么解决呢，可以通过脚本的方式来解决！

编写启动metastore和hiveserver2脚本

解决两个事：一个是命令后台启动，另一个是当该窗口关闭的时候不杀死进程

# & 后台执行
hive --service hiveserver2 &
hive --service metastore &
# nohup 当关闭窗口的时候继续保留进程，存在的问题，老是会打印一点东西
nohup hive --service hiveserver2 &
nohup hive --service metastore &

在这里插入图片描述

# 使用shell脚本优化上面的方法
# 思路，创建两个文件 metastore.log,hiveserver2.log,将来启动的日志的都输入到这两个文件中
# 1：标准输出，2：错误输出
[atguigu@hadoop102 logs]$ pwd
/opt/module/hive-3.1.2/logs
[atguigu@hadoop102 logs]$ ls
hiveserver2.log  metastore.log
[atguigu@hadoop102 logs]$

# 命令
nohup hive --service hiveserver2 1>/opt/module/hive-3.1.2/logs/hiveserver2.log 2>/opt/module/hive-3.1.2/logs/hiveserver2.log &
# 等价于上面的：让2的错误输出与1相同
nohup hive --service hiveserver2 1>/opt/module/hive-3.1.2/logs/hiveserver2.log 2>&1 &

在这里插入图片描述

# 后台无差别启动
nohup hive --service hiveserver2 1>/opt/module/hive-3.1.2/logs/hiveserver2.log 2>&1 &
nohup hive --service metastore 1>/opt/module/hive-3.1.2/logs/metastore.log 2>&1 &
# 关闭(关闭所有的RunJar)
jps|grep RunJar|awk '{print $1}' |xargs -n1 kill -9
# 根据具体的名字关闭RunJar
ps -ef|grep metastore |grep -v grep |wak '{print $2}'|xargs -n1 kill -9

脚本内容

创建该脚本在/home/atguigu/bin/hiveservices.sh,并授权: sudo chmod 744 hiveservices.sh

#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
if [ ! -d $HIVE_LOG_DIR ]
then
	mkdir -p $HIVE_LOG_DIR
fi
#检查进程是否运行正常，参数1为进程名，参数2为进程端口
function check_process()
{
    pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
    ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
    echo $pid
    [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}

function hive_start()
{
    metapid=$(check_process HiveMetastore 9083)
    cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
    cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"
    [ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
    server2pid=$(check_process HiveServer2 10000)
    cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
    [ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}

function hive_stop()
{
    metapid=$(check_process HiveMetastore 9083)
    [ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
    server2pid=$(check_process HiveServer2 10000)
    [ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}

case $1 in
"start")
    hive_start
    ;;
"stop")
    hive_stop
    ;;
"restart")
    hive_stop
    sleep 2
    hive_start
    ;;
"status")
    check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
    check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
    ;;
*)
    echo Invalid Args!
    echo 'Usage: '$(basename $0)' start|stop|restart|status'
    ;;
esac

脚本测试

在这里插入图片描述

hive常用命令交互

bin/hive -help

1."-e" 不进入hive的交互窗口执行sql语句

[atguigu@hadoop102 hive]$ bin/hive -e "select id from student;"

2."-f" 执行脚本中的sql语句

（1）在/opt/module/hive/下创建datas目录并在datas目录下创建hivef.sql文件
[atguigu@hadoop102 datas]$ touch hivef.sql
（2）文件中写入正确的sql语句
select *from student;
（3）执行文件中的sql语句
[atguigu@hadoop102 hive]$ bin/hive -f /opt/module/hive/datas/hivef.sql
（4）执行文件中的sql语句并将结果写入文件中
[atguigu@hadoop102 hive]$ bin/hive -f /opt/module/hive/datas/hivef.sql  > /opt/module/datas/hive_result.txt

3.退出hive窗口

hive窗口的退出
exit , quit 都行
beeline窗口的退出
！quit

4.在hive cli命令窗口中如何查看hdfs文件系统

hive> dfs -ls /;
Found 2 items
drwx-wx-wx   - atguigu supergroup          0 2021-09-30 23:30 /tmp
drwxr-xr-x   - atguigu supergroup          0 2021-10-01 10:11 /user
hive>

5.查看hive中输入的所有的历史命令

（1）进入到当前用户的根目录/root或/home/atguigu
（2）查看. hivehistory文件
[atguig2u@hadoop102 ~]$ cat .hivehistory

hive常见的属性配置

hive窗口打印默认库和表头

在hive-site.xml中添加如下配置

<property>
    <name>hive.cli.print.header</name>
    <value>true</value>
</property>
<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
</property>

hive运行日志信息配置

hive的log默认存放在/tmp/atguigu/hive.log目录下（当前用户名）

修改hive的log存放日志到/opt/module/hive/logs

（1）修改/opt/module/hive/conf/hive-log4j2.properties.template文件名称为
hive-log4j2.properties
[atguigu@hadoop102 conf]$ pwd
/opt/module/hive/conf
[atguigu@hadoop102 conf]$ mv hive-log4j2.properties.template hive-log4j2.properties
（2）在hive-log4j.properties文件中修改log存放位置
property.hive.log.dir=/opt/module/hive/logs

在这里插入图片描述

参数配置方式

# 第一种（永久生效）
修改hive-site.xml
# 第二种（用于测试，临时生效，覆盖hive-site.xml中的值，只针对当前客户端生效）
hive --hiveconf 参数属性=值
eg: hive --hiveconfset hive.cli.print.current.db=false
# 第三种方式(只针对当前客户端生效)
在hive窗口里面直接设置.
eg: hive> set hive.cli.print.current.db=true

hive数据类型

hive数据类型分为两种：基本数据类型和集合数据类型

基本数据类型

用的最多的 int,bigint,double,string

Hive数据类型	Java数据类型	长度	例子
TINYINT	byte	1byte有符号整数	20
SMALINT	short	2byte有符号整数	20
INT	int	4byte有符号整数	20
BIGINT	long	8byte有符号整数	20
BOOLEAN	boolean	布尔类型，true或者false	TRUE FALSE
FLOAT	float	单精度浮点数	3.14159
DOUBLE	double	双精度浮点数	3.14159
STRING	string	字符系列。可以指定字符集。可以使用单引号或者双引号。	‘now is the time’ “for all good men”
TIMESTAMP		时间类型
BINARY		字节数组

对于Hive的String类型相当于数据库的varchar类型，该类型是一个可变的字符串，不过它不能声明其中最多能存储多少个字符，理论上它可以存储2GB的字符数。

-- 注意：在hive窗口输入如下命令时不能带缩进
create table employee(
id int,
name string,
phone bigint,
salary double
);
insert into employee values(1001,"pihao",15083609372,19000.12);
-- 执行该插入sql命令时发现报错，显示为没有权限（可以直接在web页面设置目录权限）

在这里插入图片描述

集合数据类型

Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似，而STRUCT与C语言中的Struct类似，它封装了一个命名字段集合，复杂数据类型允许任意层次的嵌套。

数据类型	描述	语法示例
STRUCT	和c语言中的struct类似，都可以通过“点”符号访问元素内容。例如，如果某个列的数据类型是STRUCT{first STRING, last STRING},那么第1个元素可以通过字段.first来引用。	struct()例如struct<street:string, city:string>
MAP	MAP是一组键-值对元组集合，使用数组表示法可以访问数据。例如，如果某个列的数据类型是MAP，其中键->值对是’first’->’John’和’last’->’Doe’，那么可以通过字段名[‘last’]获取最后一个元素	map()例如map<string, int>
ARRAY	数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素，每个数组元素都有一个编号，编号从零开始。例如，数组值为[‘John’, ‘Doe’]，那么第2个元素可以通过数组名[1]进行引用。	Array()例如array

-- 创建表
create table test2(
name string,
friends array<string>,
childrens map<string,int>,
address struct<street:string,city:string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';

-- 对应的数据
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

# 将数据加载到对应的表中，这是hive提供的语法
load data local inpath '/opt/module/hive-3.1.2/datas/person.txt' into table test2
# 第二种方式
hadoop fs -put '/opt/module/hive-3.1.2/datas/person.txt' /user/hive/warehouse/test2
# 第三种方式
直接web上传文件

查询测试

在这里插入图片描述

只查询特定的

# 查询数组 [0]
# 查询map {'key'}
# 查询对象的值 .
0: jdbc:hive2://hadoop102:10000> select name,friends[0],childrens['xiao song'], address.street from test2;

INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+-----------+-----------+-------+----------------+
|   name    |    _c1    |  _c2  |     street     |
+-----------+-----------+-------+----------------+
| songsong  | bingbing  | 18    | hui long guan  |
| yangyang  | caicai    | NULL  | chao yang      |
+-----------+-----------+-------+----------------+
2 rows selected (1.195 seconds)
0: jdbc:hive2://hadoop102:10000>

注意map的struct的区别：在map中，属性和值的类型都是固定的，在struct中就不一样了

hive类型转换

Hive的原子数据类型是可以进行隐式转换的，类似于Java的类型转换，例如某表达式使用INT类型，TINYINT会自动转换为INT类型，但是Hive不会进行反向转化，例如，某表达式使用TINYINT类型，INT不会自动转换为TINYINT类型，它会返回错误，除非使用CAST操作。

隐式转换规则如下

小范围可以自动转大范围，大范围不能自动转小范围

任何整数类型都可以隐式地转换为一个范围更广的类型，如TINYINT可以转换成INT，INT可以转换成BIGINT。
所有整数类型、FLOAT和数字STRING类型都可以隐式地转换成DOUBLE。
TINYINT、SMALLINT、INT都可以转换为FLOAT。
BOOLEAN类型不可以转换为任何其它的类型。

可以使用CAST来进行强制转换

c29b339c31c4): select cast('1' as int ) + 1;

+------+
| _c0  |
+------+
| 2    |
+------+
1 row selected (0.245 seconds)
0: jdbc:hive2://hadoop102:10000>

倔强的耗子

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
hive学习之启动方式，常用命令及数据类型

hive启动方式，常用命令及数据类型启动hive启动方式有两种，一种时普通的客户端，另一种是jdbc协议的客户端普通客户端# 前提启动hadoop集群[atguigu@hadoop102 bin]$ hivexxxhive> show databases;OKdefaultTime taken: 0.844 seconds, Fetched: 1 row(s)jdbc协议的客户端beeline --------->hiveserver2--------->h
复制链接

扫一扫