一、MetaStore概述:
1.1、Hive中的数据存放在哪里?
HDFS:真正的数据
MySQL:元数据 *****
Hive中有一套自己维护的元数据信息,用户发起一个SQL查询,Hive会把SQL翻译成MapReduce去执行。
Q:如果Hive元数据的数据库挂了,Hive能否正常运行。
比如一份emp表,hdfs不知道这个表中各个列的信息,只是一份普通的文件而已;
Hive是对文件系统上的文件给它作用上一个schema信息。
==>Hive = 文件系统上的文件 + schema(可以让我们知道有哪些字段,字段与字段间的分隔符,第一个字段表示的是id还是name)
meta信息必定是要存放在一个高可用(HA)的DataBase中,比如MySQL(主备),直接通过VIP访问(https://github.com/Hackeruncle/MySQL)
Hive、Spark SQL、Impala等等SQL on Hadoop是共用一套元数据信息的。
只要有一个创建(如Hive在MySQL中创建的元数据信息),其它共用元数据信息,就都能使用
1、进入到hive-site.xml查看信息
[hadoop@hadoop002 conf]$ pwd
/home/hadoop/app/hive/conf
[hadoop@hadoop002 conf]$ ll
-rw-rw-r--. 1 hadoop hadoop 769 Apr 15 13:58 hive-site.xml
[hadoop@hadoop002 conf]$ cat hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/ruoze_d6?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property>
...........
...........
</configuration>
2、Hive所支持的层元数据存储有哪些
[hadoop@hadoop002 ~]$ cd /home/hadoop/app/hive/scripts/metastore/upgrade
[hadoop@hadoop002 upgrade]$ ll
total 20
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 derby
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 mssql
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 mysql
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 oracle
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 postgres
[hadoop@hadoop002 upgrade]$cd mysql
[hadoop@hadoop002 upgrade]$more hive-schema-1.1.0.mysql.sql //hive版本是hive-1.1.0
--
-- Table structure for table `TBLS`
--
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE IF NOT EXISTS `TBLS` (
`TBL_ID` bigint(20) NOT NULL,
`CREATE_TIME` int(11) NOT NULL,
`DB_ID` bigint(20) DEFAULT NULL,
`LAST_ACCESS_TIME` int(11) NOT NULL,
`OWNER` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL
,
`RETENTION` int(11) NOT NULL,
`SD_ID` bigint(20) DEFAULT NULL,
`TBL_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT N
ULL,
`TBL_TYPE` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT N
ULL,
`VIEW_EXPANDED_TEXT` mediumtext,
`VIEW_ORIGINAL_TEXT` mediumtext,
`LINK_TARGET_ID` bigint(20) DEFAULT NULL,
PRIMARY KEY (`TBL_ID`),
UNIQUE KEY `UNIQUETABLE` (`TBL_NAME`,`DB_ID`),
KEY `TBLS_N50` (`SD_ID`),
KEY `TBLS_N49` (`DB_ID`),
KEY `TBLS_N51` (`LINK_TARGET_ID`),
CONSTRAINT `TBLS_FK1` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`),
CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`),
CONSTRAINT `TBLS_FK3` FOREIGN KEY (`LINK_TARGET_ID`) REFERENCES `TBLS` (`
TBL_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;
其中一列:TBL_TYPE分为MANAGED TABLE和EXTERNAL TABLE
二、Hive元数据之数据库相关表介绍:
**2.1、VERSION表**version表内容:Hive版本、Hadoop用户@本机IP
注意:version表有且仅能有一条记录。
1、VERSION表
mysql> select * from version;
+--------+----------------+------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+------------------------------------+
| 1 | 1.1.0 | Set by MetaStore hadoop@10.0.0.132 |
+--------+----------------+------------------------------------+
1 row in set (0.00 sec)
我们需要去测试:
多添加一条记录,尝试重新启动Hive
没有记录会是怎么样?
1、多添加一条记录,启动Hive,报错信息如下:
其中很显而易见的提示我们:Metastore contains multiple versions (2)
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.reflect.InvocationTargetException
Caused by: MetaException(message:Metastore contains multiple versions (2) [ version = 1.1.0, comment = Set by MetaStore hadoop@10.0.0.132 ] [ version = 2.2.0, comment = Set by MetaStore john@10.0.0.132 ])
2、清空version表
hive能够正常启动,并且version表中会自动更新进来一条记录,只是每次进来的VER_ID不同:
[hadoop@hadoop002 ~]$ hive
ls: cannot access /home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/lib/spark-assembly-*.jar: No such file or directory
which: no hbase in (/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/bin:/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/bin:/home/hadoop/app/apache-maven-3.5.4/bin:/home/hadoop/app/scala-2.11.12/bin:/home/hadoop/app/hive/bin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin:/home/hadoop/app/zookeeper/bin:/usr/java/jdk1.8.0_45/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin)
Logging initialized using configuration in jar:file:/home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive>
这是第一个清空version表后启动hive后,version表内显示的内容:
mysql> select * from version;
+--------+----------------+------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+------------------------------------+
| 6 | 1.1.0 | Set by MetaStore hadoop@10.0.0.132 |
+--------+----------------+------------------------------------+
1 row in set (0.00 sec)
这是第二次清空version表后启动hive,version表内显示的内容:
mysql> select * from version;
+--------+----------------+------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+------------------------------------+
| 11 | 1.1.0 | Set by MetaStore hadoop@10.0.0.132 |
+--------+----------------+------------------------------------+
1 row in set (0.00 sec)
除了ID不一致,version表内其余信息一致。那我们对VER_ID进行修改呢,经测试也没有问题能正常启动Hive。
2.2、DBS表
DBS表的内容:DB_ID:单纯ID、DESC:表的描述
DB_LOCATION_URI:HDFS上对应的路径 默认路径:/user/hive/warehouse
NAME:数据库名字
OWNER_NAME:创建该表的用户名
OWNER_TYPE:
举例:新创建一个数据库rouze_d6,就会在对应的目录下生成一个文件目录,如:/user/hive/warehouse/ruoze_d6,只新建数据库数据库下无任何表的话,对应的hdfs目录下自然为空。
mysql> select * from DBS;
+-------+-----------------------+--------------------------------------------------------+----------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+--------------------------------------------------------+----------+------------+------------+
| 1 | Default Hive database | hdfs://10.0.0.132:9000/user/hive/warehouse | default | public | ROLE |
| 3 | NULL | hdfs://10.0.0.132:9000/user/hive/warehouse/ruoze_d6.db | ruoze_d6 | hadoop | USER |
+-------+-----------------------+--------------------------------------------------------+----------+------------+------------+
2 rows in set (0.00 sec)
2.3、DATABASE PARAMS表
在使用"WITH DBPROPERTITIES"来指定数据库参数的时候,
就会存放在database_params这张表中。
2.4、TBLS表
元数据表字段 | 示例 | 说明 |
---|---|---|
TBL_ID | 1 | 表ID |
CREATE_TIME | 1560991148 | 创建时间 |
DB_ID | 3 | 数据库ID |
OWNER | hadoop | 所有者 |
RETENTION | 0 | 保留字段 |
SD_ID | 1 | 序列化配置信息,与TBL_ID保持一致 |
TBL_NAME | emp | 表名 |
TBL_TYPE | MANAGED_TABLE | 表类型(内部表) |
VIEW_EXPANDED_TEXT | NULL | 视图的详细HQL |
VIEW_ORIGINAL_TEXT | NULL | 视图的原始HQL语句 |