配置hive元数据存储在mysql中

默认情况下,hive的元数据信息存储在内置的Derby数据中。Facebook将hive元数据存储在关系数据库
mysql中。配置过程如下:
1 安装好mysql

创建mysql密码

[root@expedia-hdp1 Downloads]# mysqladmin -u root password expedia


创建用户hadoop
hdpusr@expedia-HDP1:~/hive-0.7.1-bin/bin$ mysql -uroot -p****

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 49
Server version: 5.1.61-0ubuntu0.11.10.1 (Ubuntu)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> exit;
Bye

 

创建新用户

mysql> create user 'hdpusr'@'localhost' IDENTIFIED BY '******';

mysql> grant all privileges on *.* to 'hdpusr'@'localhost' with grant option;

 

GRANT USAGE ON *.* TO 'hive'@'%' IDENTIFIED BY PASSWORD '*4DF1D66463C18D44E3B001A8FB1BBFBEA13E27FC' |

| GRANT ALL PRIVILEGES ON `hive`.* TO 'hive'@'%'

 

回收权限

mysql> revoke all on *.* from hdpusr@localhost

revoke只能取消用户的权限,而不可以删除用户,及时取消了所有权限,用户仍然可以连接到服务器,要想彻底删除用户,必须用delete语句将该用户的记录从mysql数据库中的user表中删除。

mysql> use mysql;

mysql> delete from user where user='hdpusr' and host='localhost';

 

切换用户
hdpusr@expedia-HDP1:~/hive-0.7.1-bin/bin$ mysql -uhdpusr -p******
mysql> exit;
Bye

 

2 修改配置文件hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/h1/warehouse</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hdpusr</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hdp</value>
  <description>password to use against metastore database</description>
</property>
</configuration>

3 下载mysql-connector-java-5.1.18,将其拷贝至hive安装目录lib文件夹下

hdpusr@expedia-HDP1:~/hive-0.7.1-bin/bin$ ./hive

hive> create table temp;

FAILED: Error in semantic analysis: Either list of columns or a custom serializer should be specified

此时检查mysql数据库,已经有hive创建的数据库hive及相关表了

 

原来是创建表语句出错了

hive> create table temp(id int, name string);
OK
Time taken: 2.702 seconds
hive> show tables;
OK
temp
Time taken: 0.167 seconds

 

列出函数及函数用法

hive> show functions
hive> describe function functionName

 

hive仅支持int和string等原子数据类型,但通过to_date unix_timestamp date_diff date_add date_sub等函数就能完成类似mysql同样复杂的时间日期操作

 

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
+--------------------+
3 rows in set (0.04 sec)

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-----------------+
| Tables_in_hive  |
+-----------------+
| BUCKETING_COLS  |
| COLUMNS         |
| DATABASE_PARAMS |
| DBS             |
| PARTITION_KEYS  |
| SDS             |
| SD_PARAMS       |
| SEQUENCE_TABLE  |
| SERDES          |
| SERDE_PARAMS    |
| SORT_COLS       |
| TABLE_PARAMS    |
| TBLS            |
+-----------------+
13 rows in set (0.00 sec)

 

 

分区
hive与mysql分区有些区别,mysql分区是用表结构中的字段来分区(range,list,hash等),而hive不同,他需要手工指定分区列,这个列是独立于表结构,但属于表中一列,在加载数据时手动指定分区

创建表:

hive> create table test(id int, name string, regtime string) partitioned by(ds string) row format delimited fields terminated by ' ' stored as textfile;
OK
Time taken: 16.945 seconds

 

 

创建一个test.txt数据文件,内容如下:

1 Jack 2012-04-20 09:00:00

2 Lucy 2012-04-21 09:10:00

3 LiLei 2012-04-22 09:20:00

4 HMM 2012-04-23 09:30:00

 

 

hive> load data local inpath '/download/test.txt' overwrite into table test partition(ds='2012-04-22');
Copying data from file:/download/test.txt
Copying file: file:/download/test.txt
Loading data to table default.test partition (ds=2012-04-22)
OK
Time taken: 4.048 seconds
hive> load data local inpath '/download/test.txt' overwrite into table test partition(ds='2012-04-21');
Copying data from file:/download/test.txt
Copying file: file:/download/test.txt
Loading data to table default.test partition (ds=2012-04-21)
OK
Time taken: 0.917 seconds

 

有可能会抛出下面异常:

The ratio of reported blocks 0.9934 has not reached the threshold 0.9990. Safe mode will be turned off automatically.

 

解决方法:

1. 修改dfs.safemode.threshold.pct为一个比较小的值,缺省是0.999。 
2. hadoop dfsadmin -safemode leave命令强制离开

 

如果执行一个查询,hive会将其转换成map reduce在hadoop上执行(select * from test除外)

hive> select * from test where to_date(regtime)>to_date('2012-04-20');
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201202201821_0001, Tracking URL = http://nn0001:50030/jobdeta                                                                                                 ils.jsp?jobid=job_201202201821_0001
Kill Command = /download/hadoop-0.20.203.0/bin/../bin/hadoop job  -Dmapred.job.t                                                                                                 racker=nn0001:9001 -kill job_201202201821_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-02-20 18:31:51,213 Stage-1 map = 0%,  reduce = 0%
2012-02-20 18:32:03,432 Stage-1 map = 100%,  reduce = 0%
2012-02-20 18:32:12,594 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201202201821_0001
MapReduce Jobs Launched:
Job 0: Map: 1   HDFS Read: 519 HDFS Write: 174 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2       Lucy    2012-04-21      2012-04-21
3       LiLei   2012-04-22      2012-04-21
4       HMM     2012-04-23      2012-04-21
2       Lucy    2012-04-21      2012-04-22
3       LiLei   2012-04-22      2012-04-22
4       HMM     2012-04-23      2012-04-22
Time taken: 43.417 seconds

 

 

创建table时指定分隔符

CREATE TABLE if not exists t_order(
id int,
sale_id int,
customer_id int,
product_id int,
amount int
) PARTITIONED BY (ds STRING) row format delimited fields terminated by ' ';

 

hive> load data local inpath '/home/h1/Downloads/data.txt' overwrite into table t_order partition (ds='2012-05-04');

 

t_order.txt格式:

381 83 83 83 83
382 61 61 61 61
383 19 19 19 19
384 89 89 89 89
385 16 16 16 16
386 47 47 47 47
387 49 49 49 49
388 82 82 82 82
389 27 27 27 27
390 84 84 84 84
391 62 62 62 62
392 81 81 81 81

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值