hive基础入门

最新推荐文章于 2022-01-20 10:13:20 发布

sparkjvm

最新推荐文章于 2022-01-20 10:13:20 发布

阅读量1.2k

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/sparkjvm/article/details/42387775

版权

hive 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1.Hive
在hadoop属于数据仓库的角色，能够管理hadoop中的角色，同时可以查询hadoop中的数据.
Hive本质上讲他是一个sql解析引擎，hive可以把sql查询转换为MapReduce中的job来运行.
Hive有一套映射工具,可以把sql转换为MapReduce中的Job，可以把sql中的表字段转换为HDFS中的
文件(夹),以及文件中的列，这套映射工具我们称之为metastore(元存储).一般存放在deroy,mysql中
2、安装hive(客户端工具)
.hive在hdfs中的默认位置是/user/hive/warehouse/
.可以修改默认hive在hdfs下的数据存储位置
hive的conf中的hive-site.xml(这个文件是被重新命名的文件,原文件名不叫这个,
在下面修改了,文件名，才做这一步的操作的)
<property>
<name>hive.metastore.warehouse.dir</name>
//默认数据库位置
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>

2.1 解压缩
tar -zxvf hive-0.9.0.tar.gz
2.2 重命名
[root@hadoop0 Downloads]# mv hive-0.9.0 hive
2.3 移动到/usr/local/目录下
[root@hadoop0 Downloads]# mv hive /usr/local/
2.4 删除gz后缀的文件(可选步骤)
[root@hadoop0 local]# rm -rf *.gz
2.5 hive设置到环境变量中
[root@hadoop0 local]# vi /etc/profile //修改
export HIVE_HOME=/usr/local/hive
export PATH=.:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PIG_HOME/bin:$ZOOKEEPER_HOME/bin:
$HBASE_HOME/bin:$JAVA_HOME/bin:$PATH
[root@hadoop0 local]# source /etc/profile 让配置立即生效
2.6 重命名hive的conf下面的配置文件
[root@hadoop0 conf]# mv hive-default.xml.template hive-site.xml //修改文件名
[root@hadoop0 conf]# mv hive-env.sh.template hive-env.sh //修改文件名
[root@hadoop0 conf]# ls
hive-env.sh hive-log4j.properties.template
hive-exec-log4j.properties.template hive-site.xml
2.7 修改配置文件内容
.修改hadoop下面的配置文件
[root@hadoop0 conf]# vi ../../hadoop/conf/hadoop-env.sh
修改内容
export HADOOP_CLASSPATH=.:$CLASSPATH:$HADOOP_CLASSPATH:$HADOOP_HOME/bin
.修改$HIVE_HOME/hive/bin/hive-conf.xml文件
[root@hadoop0 conf]# vi ../bin/hive-config.sh
增加内容
export JAVA_HOME=/usr/local/jdk
export HIVE_HOME=/usr/local/hive
export HADOOP_HOME=/usr/local/hadoop
********************************************************
.hive在hdfs中的默认位置是/user/hive/warehouse/
.可以修改默认hive在hdfs下的数据存储位置
修改hive的conf下的hive-site.xml(这个文件是被重新命名的文件,原文件名不叫这个,
在下面修改了,文件名，才做这一步的操作的)
<property>
<name>hive.metastore.warehouse.dir</name>
//默认数据库位置
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
2.8 使用hive(必须保证hadoop是正在运行的)
[root@hadoop0 bin]# ls //进入hive的bin目录
ext hive hive-config.sh
[root@hadoop0 bin]# hive 执行hive就可以进入他的终端了
2.9 使用mysql数据库命令可以操作hive
hive> show databases; //查看数据库列表
hive> use default; //使用表
hive> show tables; //查看有多少张表
hive> create table t1(id string);
//有时候会报错,是因为hadoop的安全模式是开启的
hadoop dfsadmin -safemode leave,先关闭安全模式,才能创建表成功
hive> show tables; //验证
hive> select * from t1; //查询表内容
2.9.1 hive是存储在HDFS文件系统中的
.http://hadoop1:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/ 中可以查看
.user目录里面有hive的数据文件
3.0 安装mysql数据库做为hive的数据仓库
3.1 不同hive客户端访问同一资源就会报错
.hive只能用一个客户端打开(一个用户访问资源)
derby.log ext hive hive-config.sh metastore_db(一般不会用db,测试使用)
.实际开发中都是使用mysql,不会使用自带的db数据库
3.2 删除hive的默认仓库
[root@hadoop0 bin]# rm -rf metastore_db/ //删除hive的db数据文件
[root@hadoop0 conf]# vi hive-site.xml //修改hive数据库默认存储位置
hive.metastore.warehouse.dir 这一项修改
******************************************开始安装********************
(1) 删除linux默认安装的mysql数据库
[root@hadoop0 conf]# rpm -qa | grep mysql //查看是否安装了mysql
mysql-libs-5.1.66-2.el6_3.i686
[root@hadoop0 conf]# rpm -e mysql-libs-5.1.66-2.el6_3.i686 //删除mysql
error: Failed dependencies: //有依赖不能删除
libmysqlclient.so.16 is needed by (installed) postfix-2:2.6.6-2.2.el6_1.i686
libmysqlclient.so.16(libmysqlclient_16) is needed by (installed) postfix
-2:2.6.6-2.2.el6_1.i686
mysql-libs is needed by (installed) postfix-2:2.6.6-2.2.el6_1.i686
[root@hadoop0 conf]# rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps //强制删除
[root@hadoop0 conf]# rpm -qa | grep mysql //验证是否删除
rpm -qa | grep -i mysql
.把需要安装文件拷贝到/root/Dowloads中
[root@hadoop0 conf]# cd /root/Downloads
[root@hadoop0 Downloads]# ls
MySQL-client-5.5.31-2.el6.i686.rpm //客户端文件
mysql-connector-java-5.1.10.jar //驱动
MySQL-server-5.5.31-2.el6.i686.rpm //服务端文件
(2) 执行命令rpm -i MySQL-server-5.5.31-2.el6.i686.rpm安装mysql服务端
[root@hadoop0 Downloads]# rpm -i MySQL-server-5.5.31-2.el6.i686.rpm
(3) 启动mysql服务端
mysqld_safe & //后台启动
(4) 安装mysql客户端
[root@hadoop0 Downloads]# rpm -i MySQL-client-5.5.31-2.el6.i686.rpm
(5) 修改root用户密码
[root@hadoop0 Downloads]# mysql_secure_installation //修改数据库配置信息

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MySQL
SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY!

In order to log into MySQL to secure it, we'll need the current
password for the root user. If you've just installed MySQL, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none): //当前密码,第一次使用为null
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MySQL
root user without the proper authorisation.

Set root password? [Y/n] Y //设置密码
New password: /admin
Re-enter new password: //admin
Password updated successfully!
Reloading privilege tables..
... Success!

By default, a MySQL installation has an anonymous user, allowing anyone
to log into MySQL without having to have a user account created for
them. This is intended only for testing, and to make the installation
go a bit smoother. You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] n //是否删除匿名用户
... skipping.

Normally, root should only be allowed to connect from 'localhost'. This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] n //是否允许远程登录
... skipping.

By default, MySQL comes with a database named 'test' that anyone can
access. This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] n //是否删除测试数据库
... skipping.

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] Y //重新加载权限表
... Success!

Cleaning up...

All done! If you've completed all of the above steps, your MySQL
installation should now be secure.

Thanks for using MySQL!

登录mysql数据库验证
[root@hadoop0 Downloads]# mysql -uroot -padmin

3.4 使用mysql作为hive的metastore(数据仓库)
(1) 把mysql的jdbc驱动放置到hive的lib目录下
[root@hadoop0 Downloads]# cp mysql-connector-java-5.1.10.jar /usr/local/hive/lib/
(2) 修改hive-site.xml文件
<property>
<name>javax.jdo.option.ConnectionURL</name>
//连接地址
<value>jdbc:mysql://hadoop0:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
//mysql驱动
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
//用户名
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
//用户密码
<value>admin</value>
</property>
(3) 进入hive的bin目录下
[root@hadoop0 bin]# ls
derby.log ext hive hive-config.sh
[root@hadoop0 bin]# hive 运行hive
hive> show databases; //查看数据库
hive> use default; //使用表
(4) 远程连接一下mysql在windows下
.这个时候会报错,因为mysql默认是不能远程连接的
[root@hadoop0 bin]# mysql -uroot -padmin //登录mysql
mysql> grant all on hive.* to 'root'@'%' identified by 'admin'; //授权root
Query OK, 0 rows affected (0.09 sec) //在任何位置(%)都可以远程登录
mysql> flush privileges; //刷新权限
(5) 连接成功,使用的是SQLyog数据库可视化软件

4、hive几种表的操作
4.1 内部表(受控表)
create table t1(id int); //创建一张内部表
http://hadoop1:50075/browseDirectory.jsp?dir=%2Fhive&namenodeInfoPort=50070 查看表
//下面语句中加上的local,指的是从hdfs下面找,如果不加local,那么就会报错
hive>load data local inpath '/root/id' into table t1; //加载文件中的数据到t1表中
hive>select * from t1; //可以查看加载进来的数据

4.1.0 基于hadoop2.0的HA模式下加载数据

hive>load data inpath 'hdfs://cluster1/root/id' into table t4;

4.1.1 感觉加载进来的文件和我们put到hdfs中的文件类似?
http://hadoop1:50075/browseDirectory.jsp?dir=%2Fhive%2Ft1&namenodeInfoPort=50070
[root@hadoop0 ~]# hadoop fs -put id /hive/t1/id2 //put一下这个id文件,改名为id2
***********************************************************
结论就是加载可以有两种方式
hive的load语句,或者hadoop的put方式,加入文件数据,hive都是认的
4.1.2 创建一张表t2
告诉行的格式,以制表符分隔
hive>create table t2(id int,name string) row format delimited fields terminated by '\t';
在hdfs系统下创建一个stu文件,里面有相应的字段值对应表
[root@hadoop0 ~]# hadoop fs -put stu /hive/t2 上传这个文件数据到hive数据仓库t2表中
hive> seletc id from t2;或者seletc * from t2; //验证
4.2 分区表
创建分区表,天来划分
hive>create table t3(id int,name string) partitioned by(day int); //天来划分
加载数据到分区表,按天加载数据
hive>load data local inpath '/root/id' into table t3 partition (day=22);//加载数据
hive>seletc * from t3 where day=22; //验证
//分区表都是使用分区字段来查询,下面的情况
hive>seletc * from t3 where id=1; //他会把所有分区字段里面id=1的数据查询出来
4.3 桶表
桶表是对数据进行哈希取值，然后放到不同文件中存储。
创建表
create table bucket_table(id string) clustered by(id) into 4 buckets;
加载数据
set hive.enforce.bucketing = true;
insert into table bucket_table select name from stu;
insert overwrite table bucket_table select name from stu;
数据加载到桶表时，会对字段取hash值，然后与桶的数量取模。把数据放到对应的文件中。

抽样查询
select * from bucket_table tablesample(bucket 1 out of 4 on id);
例子:
create table t4(id int) clustered by(id) into 4 buckets; //把数据分到4个桶中
//加载数据到桶表
set hive.enforce.bucketing = true;
insert into table t4 select id from t3; //通过mapreduce运算把表中的数据放到桶中
//通过地址查看目录结构,类似这样的结构,里面的内容是通过hash编码某个标准后存放的
http://hadoop2:50075/browseDirectory.jsp?dir=%2Fhive%2Ft4&namenodeInfoPort=50070
000000_0 .数据都有一定的特点,非常有可能相等数据
000001_0 .可以用来抽样调查
000002_0 .桶表仅仅在做表链接的时候用
000003_0
5、外部表
创建数据文件external_table.dat

创建表
hive>create external table external_table1 (key string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/home/external';
在HDFS创建目录/home/external
#hadoop fs -put /home/external_table.dat /home/external

加载数据
LOAD DATA INPATH '/home/external_table1.dat' INTO TABLE external_table1;

查看数据
select * from external_table
select count(*) from external_table

删除表
drop table external_table
例子：
[root@hadoop0 ~]# hadoop fs -put id /external/id //从本地put一个文件到hdfs
//外部目录指定数据在哪里,所以不需要指定路径，直接就可以查询
hive>create external table t5(id int) location '/external'; //创建一张外部表
hive>seletc * from t5; //验证
hive>drop table t5; //删除表,只是删除表定义,二指定的数据文件不会删除

6、视图


注: 可以使用limit操作,限定返回结果
hive>seletc * from t1 limit 5; //返回5条结果
可以使用order by xxx
7、java远程hive
.需要启动,不然不能远程hive
hive --service hiveserver >/dev/null 2>/dev/null &
.java客户端代码
public class App {
public static void main(String[] args) throws Exception {
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
Connection con = DriverManager
.getConnection("jdbc:hive://hadoop0:10000/default", "", "");
Statement stmt = con.createStatement();
String querySQL="SELECT * FROM default.t1";

   ResultSet res = stmt.executeQuery(querySQL);

while (res.next()) {
System.out.println(res.getInt(1));
}

   }
}
8、用户自定义函数(UDF操作)
.按下tab键,提示是否显示全部函数,y可以查看全部函数
显示所有函数：
hive> show functions;

查看函数用法：
hive> describe function substr;
hive> describe function pi; //具体使用,查看某个命令使用方法
OK
pi() - returns pi
Time taken: 0.052 seconds
求和操作：
select sum(id) from t1; //函数求和

9、hive安装的常见错误(包括0.11等版本)
1.hive> show tables;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
导致这个错误的原因很多，有必要查看详细错误信息。退出hive,以debug模式启动hive并将信息显示到控制台：
hive -hiveconf hive.root.logger=DEBUG,console发现以上错误是由Caused by:
MetaException(message:Version information not found in metastore. )导致的。
解决方法将hive-site.xml 里面 hive.metastore.schema.verification 的值改为 false后，就没出现错误了。
2.[Fatal Error] hive-site.xml:2000:16: The element type "value" must be terminated by the matching
end-tag "</value>".14/04/14 19:34:36 FATAL conf.Configuration: error parsing conf file:
org.xml.sax.SAXParseException: The element type "value" must be terminated by the matching
end-tag "</value>".报错很明显hive-site.xml 2000行有错，查看发现2000行竟是这样的<value>auth</auth>能
没错吗，将</auth>改为</value>错误解决。

sparkjvm

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive基础入门

1.Hive 在hadoop属于数据仓库的角色，能够管理hadoop中的角色，同时可以查询hadoop中的数据. Hive本质上讲他是一个sql解析引擎，hive可以把sql查询转换为MapReduce中的job来运行. Hive有一套映射工具,可以把sql转换为MapReduce中的Job，可以把sql中的表字段转换为HDFS中的文件(夹),以及文件中的
复制链接

扫一扫