Hive基础入门详解(一)

最新推荐文章于 2023-05-04 19:25:59 发布

XiuL

最新推荐文章于 2023-05-04 19:25:59 发布

阅读量223

点赞数

分类专栏：大数据组件命令介绍

本文链接：https://blog.csdn.net/qq_35310348/article/details/107859344

版权

大数据组件命令介绍专栏收录该内容

12 篇文章 0 订阅

订阅专栏

Hive基础入门详解
一.Hive的安装（使用mysql共享hive元数据）

1.安装mysql 启动mysql
2.解压hive-1.1.0-cdh5.14.0.tar.gz
3.解压目录下conf中，cp hive-env.sh.template hive-env.sh
  编辑hive-env.sh，修改：
  HADOOP_HOME=/xxx/hadoop   hadoop的安装目录
  export HIVE_CONF_DIR=xxx/hive-1.1.0-cdh5.14.0/conf   hive解压目录的conf目录
4.解压目录conf下，创建hive-site.xml
  主要是配置hive的元数据存储的mysql信息
  注意：需要上传mysql的驱动包到hive的lib目录下

二.如何执行Hive命令：

1.shell命令方式：
  bin/hive 
2.hive jdbc方式：
  启动hiveserver2：
	nohup bin/hive --service hiveserver2 &
  beeline连接hiveserver2：
	!connect jdbc:hive2://192.168.30.172:10000   (其中的ip是hive所在服务器ip)
3.hive命令：
  使用-e参数执行hql语句：
	bin/hive -e "use test;select * from student;"
  使用-f参数通过指定文本文件执行hql语句：
	bin/hive -f hive.sql
	hive.sql: usr test;select * from student;

三.Hive对于数据库、表描述性命令：

1.查看所有数据库：show databases;
2.创建一个数据库：create databse lwg;(未指定存储路径，数据存储在默认的配置hive.metastore.warehouse.dir中)
   create databse lwg01 location '/lwg03';自己指定hdfs路径
3.查看数据库详细信息：desc database lwg;
4.删库：drop database lwg;如果数据库下有表，会报错
  强制删库： drop database lwg cascade;
5.查看表的类型：desc formatted tablename; 可以查看表是内部表还是外部表
6.查看分区：show partitions score;
7.添加多个分区: alter table score add partition(month='xx') partition(month='xx1');
8.删除分区：alter table score drop partition(month='xx');
9.修改表名: alter table old_name rename to new_name;

四.Hive建表语法

1.关键字介绍：
   external：表示是一个外部表。注意：内部表数据会移入真实数仓路径，删除表会删除元数据和数据。外部表指向了hdfs一个路径，删除表只删元数据，不删数据。例如：日志数据定期上传到hdfs，基于外部表做分析，用到的中间表，结果表使用功能内部存储。数据通过select + insert 进入内部表。
   like：复制表结构，不复制数据
   row format delimited fields terminated by :每行数据字段间隔符号
   stored as textfile|rcfile|sequencefile|...：如果数据文件是纯文本，用textfile，如果需要压缩，用sequencefiel
   partition by:分区
   clustered by:分桶
2.创建普通表： create table stu(id int,name string);
3.创建表指定分隔符号：create table if not exists stu2(id int,name string) row format delimited fields terminated by '\t' stored as textfile location 'user/stu2';
4.复制表结构和数据：create table stu3 as select * from stu2;
5.复制表结构，不要数据：create table stu4 like stu2;
6.创建分区表：
   一个分区：create table sc(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t';
   多个分区：create table sc2(s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t';

五.Hive加载数据语法

1.insert语句：（需要启动hdfs和yarn，否则会卡住报错）
	insert into stu values(1,'zs');底层执行的是mapreduce任务
  添加数据到某个分区：
	 insert into table score partition(month='211') values('11','11','11');
  读取某张表的数据，插入到另张表的某个分区：
	 insert overwrite into table score partition(month='22') select id,uid,sc from score3; overwrite只会覆盖对应分区的数据
  读取某张表的数据，插入到其他两个表的某个分区：
	 from score insert overwrite table score_first partition(month='11') select id,uid,sc insert overwrite table score_second partition(month='22') select id,uid,sc;
2.加载本地服务器文件数据到表：
  load data local inpath '/服务器路径/a.csv' overwrite into table teacher;
3.从hdfs加载数据到表：
  load data inpath '/hdfs/stu.csv' into table stu;

六.Hive导出数据语法

1.将查询结果导出到本地：目录可以不存在
  insert overwrite local directory '/xx/data' select * from score;
2.将查询结果格式化导出到本地：
  insert overwrite local directory '/xx/data' row format delimited fields terminated by '\t' collection items terminated by '#' select * from student;
3.将查询的结果导出到hdfs上：
  insert overwrite directory '/hdfs/data' row format delimited fields terminated by '\t' collection iterm terminated by '#' select * from stu;
4.hdfs命令导出到本地:
  hdfs dfs -get /hdfs/xx/data/000_0 /local/data/stu.txt
5.导出到hdfs上：
  export table score to '/hdfs/data/score';

七.Hive查询语法

1.order by:全局排序，一个reducer，不适合数据量大。
2.sort by:每个reducer都排序。mapred.reduce.tasks>1
3.distribute by:将数据分散到不同reducer，hash算法
4.cluster by:将数据分散到不同reducer，还会对该字段排序
5.count()  max()  min()   sum()  avg()  limit(limit只能接一个参数，返回多少行数据)
6.where语句： where s_socre > 60;
7.between 80 and 100     where score in(80,90)
8.like和rlike   _代表一个字符，%代表零个或者多个字符.
9.rlike可以接正则表达式。  
10.and  or  not 
11.group by   having
12.join 只支持等值连接，不支持不等值连接
13.inner join 只有两张表都有匹配的数据，才会保留下来。
	多表连接：那个表，至少需要n-1个连接条件。Hive对每个JOIN连接对象启动一个MR任务。

XiuL

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive基础入门详解(一)

Hive基础入门详解一.Hive的安装（使用mysql共享hive元数据）1.安装mysql 启动mysql2.解压hive-1.1.0-cdh5.14.0.tar.gz3.解压目录下conf中，cp hive-env.sh.template hive-env.sh 编辑hive-env.sh，修改： HADOOP_HOME=/xxx/hadoop hadoop的安装目录 export HIVE_CONF_DIR=xxx/hive-1.1.0-cdh5.14.0/conf hiv
复制链接

扫一扫