hive的学习过程

最新推荐文章于 2023-04-21 15:19:18 发布

久香182

最新推荐文章于 2023-04-21 15:19:18 发布

阅读量743

点赞数 1

文章标签： hive 大数据

本文链接：https://blog.csdn.net/qq_62553456/article/details/127926923

版权

1、 //查看端口22
2、netstat -lnp // 查看所有端口
3、ip addr 或 ifconfig -a //查看ip地址
4、出现Faile to start LSB：up/down networking的解决办法：
（1）关掉NetworkManager(systemctl stop/disable NetworkManager)
（2）重新启动网卡（systemctl restart network或reboot）
5、hive的启动流程:
先 zkServer.sh start-------->start-dfs.sh------>start-yarn.sh----->hive --service metastore & --->hiveserver2 & ------>启动客户端 beeline -u jdbc:hive2:10000 -n root -p
6、删除数据库
Drop database if exists 数据库名；//没有表的情况
drop database if exists 数据库名 cascade; //连数据库里的表也删除

基本操作
1、创建表：
creat[external] table[表名]
（列明）
row format delimited fields terminated by '\t'// 指定文本数据格式的标识符
stored as textfile //文件的存储格式
[location hdfs_path] //外部表的存储地址
2、加载数据
load data : 本地的数据（加载hdfs的数据）
insert 语句插入数据

3、启动：node02:hive --service metastore &
node01:hiverser2 &
node03:beeline -u jdbc:hive2://node01:10000 -n root -p 123456
4、创建表：
使用数据库: use database(数据库名):
查看数据库：show databases;
创建表：
create[external] table 表名
(字段名)
row format delimited
feilds teminated by '分隔符'
stored as textfile
[location hdfs_path]
创建内部表：
creat table UserName(id int,name string) row format delimited fileds teminated by '\t' stored as textfile;
创建外部表：
creat external table Age(id int,age int) row format delimited fileds terminated by '\t' stored textfile location "hdfs_path"
5、查询
查看数据库：show databases;
查看表数据: select * from Age;
6、导入数据：
load方式:
load data local inpath "file:///opt/software/text/test01.txt" overwrite into table Age;
inseret方式：
insert into table Age (id,age) values(4,18);
插入结果集：
insert into table Age（id,age）select * from Age;
overwrite插入数据：
insert into table Age values(4,12);
hdfs导入数据：
hdfs dfs -put test.txt /user/hive/warehouse/testdb2.db/Age

6、导出数据
insert语句导出到本地：
insert overwrite loacl directory '/opt/testdata/Age' row format delimited fileds terminated by '\t' select * from Age;

insert语句导出到hdfs:
insert overwrite directory '/opt/testdata/Age' row format delomited fileds terminated by '\t' select * from Age;

7、复杂表的创建
create table student(
id int,
name string,
age int,
fav array<string>,
addr map<string,string>,
contacts struct<mobile:string,mail:string,qq:string>)
row format delimited
fields terminated by '\t'
collection items terminated by '-'
map keys termiated by ':'
stored as textfile;

hive的最基本操作2
1、修改表
重命名：
alter table 表名 rename to 新的表名
改变列名：
alter table 表名 change old_col_name new_col_name column_type;
增加、更新列：
alter table 表名 add|replace columns(clo_name data_type [conmment col_comment],.....)

问题：
Caused by: InvalidOperationException(message:The following columns have types incompatible with the existing columns in their respective positions :
解决：
这个是版本的问题：有的版本已经放弃对columns的强制转换如果要改只能这样：alter table 表明 change colums sex nid int;
然后 alter table 表名 change colums nid sex int;
因为不能：alter table 表名 replace colums(sex int)；

外部表的创建：
1、先准备hdfs :hdfs dfs -mkdir /testdata
2、创建外部表：
create external table student(id int,age int,name string)
row format delimited fields terminated by '\t'
stored as textflie
location '/textdata';

4、分区操作：创建表的时候利用province字段指定分区
create table person01(id int,name string) partition by(province string) row format delimited fields terminated by ',' stored as textfile;

导入数据：load data local inpath 'file:///opt/test01.txt' into table person01 partition (province="贵州")

查询：
根据分区查询：select * from person where province='guizou';

创建多级分区：
create table person01(id int,name string)
partitioned by(province string,city int)
row format delimited
fields terminated by ','
stored as textfile;

6、数据分桶
分桶语句：clustered by('分桶字段',....) into{分桶数}buckets
分区语句：partition by('分区字段,...)
创建表：
create table person02(id int,name string) clustered by(id) into 4 buckets row format delimited fields terminated by ',' stored as textfile;

导入数据：
load data local inpath "" into table person02;

抽样查询：
select * from person03 tablesample (bucket 2 out of on 4 id)

表的关联查询:
select * from person03 join person01 on(person01.id=person03.pid)

hive自定义函数
1、流程：创建类 ---》导入依赖----》编写函数---》打包---》上传jar包---》创建函数
2、编码：
创建：create temporary function fun_hello as 'org.hadoop.hive.demo.hellUDF'
使用数据库：use intership;
查询：1) select fun_llo(name) from person02;
2) select fun_hello('kkk')
自定义聚合函数：
1、定义：多条记录合成一条记录（count,max,min,sum）--->这个是内置函数
2、UDAF涉及到的类为：AbstractGenericUDAFResover----》为UDAD函数的入口类
GenericUDAFEvaluator //实现具体功能和代码都是该类实现的他的实现需要以下几个方法：
init()
iterate()
terminatePartial()
merge()
terminate()

//导入jar包的时候出现下面的问题：
Cannot resolve io.airlift:slice:0.29
解决：这个是mven的配置路径问题
到idea的seting去设置

java操作hive
导入依赖：
<dependency>
<groupId>oeg.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.3.7</version>
</dependency>
编写程序
public static void main(String[] args){
String driverName = "org.apache.hive.jdbc.HiveDriver";
String url = "jdbc:hive2://node01:10000/testdb2";
Connection con=null;
Statement stmt=null;
try{
       Class.forName(driverName);
       con=DriverMAnager.getConnection(url);
       stmt =con.creatStatement();

       Srtring sql= "create databases hive_jdbc_test";
       System.out.println("Running:"+sql);
       stmt.execute(sql);
}cath(Exeption e){
e.printStackTrace();
}finaly{
       if(stmt !=null){
           stmt.close();
}
if(con !=null){
           con.close();
}
}
}

数据仓库建模
特点：
面向主题

集成的

相对稳定的

反应历史变化的

维度建模的三种模型：
星型模型 ----维度表都围绕事实表展开
星座模型 ----维度表可以有多格事实表
雪花模型 ----维度里有维度

查看用户权限 show grant user root;

hive权限配配置
存储权限 hdfs
sql权限
数据权限

窗口函数
定义：显示聚合函数前后的数据
windows子句：
-preceding
-followwing
-current row
-undounded
序列函数不支持window子句：如函数（ntile、row_number、rank、dense_rank）
案例：
创建表：
create table cookie_t(cookietid string,creatime string,pv int) row format delimited fields terminated by ',' stored as textfile;
窗口聚合:
聚合函数+over()
select cookietid,creatime,sum(pv) over()from cookie_t;

聚合函数+over(group by coocietid)
select cookietid,creatime,sum(pv) over(group by cookietid)from cookie_t;

聚合函数+over(partition by cookietid order by creatime)
select cookietid,creatime,sum(pv) over(partition by cookietid order by creatime) from cookie_t;

加限定条件：
select cookietid,creatime,pv,sum(pv) over(partition by cookietid order by creatime rows
between unbouned perecing and row current row)；