Configuring the Hive Metastore for CDH

The Hive 元数据存储服务 存储the 元数据for Hive 表and 分区in a 关系型数据库, and provides clients (including Hive) 访问这些信息 using the metastore service API. This page explains deployment options and provides 指导for setting up a database in a recommended configuration.


Metastore Deployment Modes

Embedded Mode嵌套模式,默认的

Cloudera 建议using this mode for实验目的only.




嵌套模式 is the 默认的metastore deployment mode for CDH. In this mode, the metastore uses a Derby 数据库, and both the 数据库进程 and the 元数据存储服务进程 镶嵌在主要的HiveServer 进程中. Both are started for you when you start the HiveServer 进程. 这种环境的配置最简单, 但它仅仅支持一次一个活跃用户 and is 不靠谱for 生产环境use.


Local Mode本地模式




在本地模式中, the Hive metastore service runs in the same process as the main HiveServer process, but the metastore 数据库runs in a 独立的进程 , 且可以在一个独立的宿主机器上. The embedded metastore service communicates with the metastore database over JDBC.


Remote Mode远程模式

Cloudera 推荐你使用这种模式



In 远程模式, the Hive metastore service runs in 它自己的JVM 进程。 HiveServer2, HCatalog, Cloudera Impala™, and 其他进程communicate with it using the Thrift network API (configured using the hive.metastore.uris property). The metastore service communicates with the metastore database over JDBC (configured using the javax.jdo.option.ConnectionURL property). The database, the HiveServer 进程, and the metastore service can all be on the same host, but running the HiveServer process on a 独立的宿主机 provides 更好的可用性和扩展性.

The 远程模式对比本地模式的主要的优点是远程模式不需要要管理者和hive用户分享 JDBC 登录信息. HCatalogrequires this mode.


Supported Metastore Databases

For up-to-date information, see  CDH and Cloudera Manager Supported Databases. Cloudera 强烈鼓励你使用mysql because it is the most popular with the rest of the Hive user community, and, hence, receives 更多的测试than the other options. For installation information, see:

Metastore Memory Requirements


并发连接数 HiveServer2 建议的堆大小 Hive Metastore 建议的堆大小

Up to 40 concurrent connections

Cloudera recommends 分解HiveServer2 into 多个实例 and load-balancing once you start allocating over 16 GB to HiveServer2. 降低java垃圾回收的影响on active processing by the service.

12 - 16 GB 12 - 16 GB
Up to 20 concurrent connections 6 - 12 GB 10 - 12 GB
Up to 10 concurrent connections 4 - 6 GB 4 - 10 GB
One connection 4 GB 4 GB

Configuring the Metastore Database


这一部分描述how to 配置Hive to use a 远程数据库, with examples for MySQL,PostgreSQL, and Oracle.

The configuration properties for the Hive metastore are documented in the Hive Metastore Administration documentation on the Apache wiki.

配置a 远程MySQL 数据库for the Hive Metastore

Cloudera 推荐你配置一个数据for the metastore on 一个或多个远程服务器that reside on a host or hosts separate from the HiveServer1 或者HiveServer2 process. MySQL is the 最流行数据库 to use. Use the following steps to configure a remote metastore. If you are planning to use a 云服务数据库, 比如 Amazon Relational Database Service (RDS), see  How To Set Up a Shared Amazon RDS as Your Hive Metastore for CDH  for information about how to set up a shared Amazon RDS as your Hive metastore.

1. 安装启动MySQL if you have not already done so
To install MySQL on a RHEL system:
$ sudo yum install mysql-server
To install MySQL on a SLES system:
$ sudo zypper install mysql
$ sudo zypper install libmysqlclient_r17
To install MySQL on a Debian/Ubuntu system:
$ sudo apt-get install mysql-server
After using the command to install MySQL, you may need to respond to prompts to confirm that you do want to 完成the 安装. After 安装完成之后, 启动 mysql 守护进程.
On RHEL systems
$ sudo service mysqld start
On SLES and Debian/Ubuntu systems
$ sudo service mysql start
2.配置 the MySQL 服务和连接器

Before you can run the Hive metastore with a remote MySQL database, you must 配置一个连接器 to the remote MySQL database, set up the initial database schema, and 配置the MySQL 用户账号 for the Hive user.

To install the MySQL 连接器on a RHEL 6 system:

On the Hive Metastore server host, install mysql-connector-java and symbolically link the file into the /usr/lib/hive/lib/ directory.

$ sudo yum install mysql-connector-java
$ ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar

To install the MySQL connector on a RHEL 5 system:

Download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/5.1.html. You will need to sign up for an account if you do not already have one, and log in, before you can download it. Then copy it to the /usr/lib/hive/lib/ directory. For example:

$ sudo cp mysql-connector-java-version/mysql-connector-java-version-bin.jar /usr/lib/hive/lib/
。。。

配置MySQL to use a 复杂密码and to 开机启动. Note that in the following procedure, your current root password is 空. Press the Enter key when you're prompted for the root password.

To 设置the MySQL root 密码:

$ sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!

确保 MySQL server 开机启动:

  • On RHEL systems:
$ sudo /sbin/chkconfig mysqld on
$ sudo /sbin/chkconfig --list mysqld
mysqld          0:off   1:off   2:on    3:on    4:on    5:on    6:off
  • On SLES systems:
$ sudo chkconfig --add mysql
  • On Debian/Ubuntu systems:
$ sudo chkconfig mysql on
3. 创建数据库和用户 

The 指南in this section 假设你正在使用远程模式, and that the MySQL database 安装在独立的宿主机上 from the metastore service, which is running on a host named metastorehostin the example.

Create the initial database schema. Cloudera 推荐使用the Hive schema tool to do this.

If for some reason you decide not to use the schema tool, you can use the hive-schema-n.n.n.mysql.sql文件 instead; that file is located in the /usr/lib/hive/scripts/metastore/upgrade/mysql/ directory. (n.n.n 是当前Hive 版本, for example 1.1.0.) Proceed as follows if you decide to use hive-schema-n.n.n.mysql.sql.

Example using hive-schema-n.n.nmysql.sql

$ mysql -u root -p
Enter password:
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-n.n.n.mysql.sql;
You also need a MySQL 用户账号给Hive使用,用来访问metastore. It is 很重要to 阻止这个用户账号 from creating or altering tables in the 元数据存储数据库schema.

Example

mysql> CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword';
...
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'metastorehost';
mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'metastorehost';
mysql> FLUSH PRIVILEGES;
mysql> quit;

4. Configure the metastore service to 通信with the MySQL database

This step shows the configuration properties you need to set in hive-site.xml (/usr/lib/hive/conf/hive-site.xml) to 配置the metastore service to communicate with the MySQL database, and provides sample settings. Though you can use the 同样的hive-site.xml on 所有的宿主机上 (client, metastore, HiveServer), hive.metastore.uris is the only property that must be configured on all of them; the others are used only on the metastore host.

Given a MySQL 数据库running on宿主机 myhost and the 用户账号hive with the 密码mypassword, set the 配置as follows (overwriting any existing values).

Note: The hive.metastore.local属性被废弃 (as of Hive 0.10); 设置hive.metastore.uris is 足够to 表明你正在使用远程元数据。

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://myhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
</property>

<property>
  <name>datanucleus.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>datanucleus.autoStartMechanism</name> 
  <value>SchemaTable</value>
</property> 

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://<n.n.n.n>:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>

Configuring a Remote PostgreSQL Database for the Hive Metastore

。。。

Configuring a Remote Oracle Database for the Hive Metastore


。。。




  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值