运行环境Cloudera Hive 0.10-CDH4
在我机器上安装的Hive里有如下的表:
hive (human_resources)> describe formatted employees;
OK
col_name data_type comment
# col_name data_type comment
name string None
salary float None
subordinates array<string> None
deductions map<string,float> None
address struct<country:string,city:string,zip:int> None
# Partition Information
# col_name data_type comment
country string None
state string None
# Detailed Table Information
Database: human_resources
Owner: root
CreateTime: Mon Jul 22 23:05:47 CST 2013
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://n8.example.com:8020/user/hive/warehouse/human_resources.db/employees
Table Type: MANAGED_TABLE
Table Parameters:
numFiles 1
numPartitions 1
numRows 0
rawDataSize 0
totalSize 784
transient_lastDdlTime 1375942564
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.132 seconds
该Employees表中有如下数据(Hive会自动把select * 操作转换成文件系统读操作,所以这里并没有MR Job):
hive (human_resources)> select * from employees;
OK
name salary subordinates deductions address country state
John Doe 100000.0 ["Mary Smith","Todd Jones"] {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1} {"country":"1 Michigan Ave.","city":"Chicago","zip":null} US CA
Mary Smith 80000.0 ["Bill King"] {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1} {"country":"100 Ontario St.","city":"Chicago","zip":null} US CA
Todd Jones 70000.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"country":"200 Chicago Ave.","city":"Oak Park","zip":null} US CA
Bill King 60000.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"country":"300 Obscure Dr.","city":"Obscuria","zip":null} US CA
Boss Man 200000.0 ["John Doe","Fred Finance"] {"Federal Taxes":0.3,"State Taxes":0.07,"Insurance":0.05} {"country":"1 Pretentious Drive.","city":"Chicago","zip":null} US CA
Fred Finance 150000.0 ["Stacy Accountant"] {"Federal Taxes":0.3,"State Taxes":0.07,"Insurance":0.05} {"country":"2 Pretentious Drive.","city":"Chicago","zip":null} US CA
Stacy Accountant 60000.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"country":"300 Main St.","city":"Naperville","zip":null} US CA
Time taken: 0.164 seconds
现在我想用如下语句给Employees表创建索引,操作失败并有如下提示:
hive (human_resources)> CREATE INDEX employees_index
> ON TABLE employees (country, name)
> AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD
> IDXPROPERTIES ('creator' = 'me', 'created_at' = 'some_time')
> IN TABLE employees_index_table
> PARTITIONED BY (country)
> COMMENT 'Employees indexed by country and name.';
FAILED: ParseException line 6:0 missing EOF at 'PARTITIONED' near 'employees_index_table'
假如我去掉partitioned by子句会出现如下错误提示:
hive (human_resources)> CREATE INDEX employees_index
> ON TABLE employees (country, name)
> AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD
> IDXPROPERTIES ('creator' = 'me', 'created_at' = 'some_time')
> IN TABLE employees_index_table
> COMMENT 'Employees indexed by country and name.';
FAILED: Error in metadata: java.lang.RuntimeException: Check the index columns, they should appear in the table being indexed.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
这是Programming Hive中的一个例子,O'Reilly官网的Errata链接是:
http://oreilly.com/catalog/errata.csp?isbn=0636920023555
但是Errata中并没有人提及这个示例运行错误。错误原因未知,希望有知道的大神提示一下,是不是Hive版本问题或是其他原因?