cloudera impala初用之问题集锦(一)

 在按照https://github.com/cloudera/impala所给出的文档进行impala的源码编译之后,在运行下面的脚本之后,出现了一系列的问题:

1
2
${IMPALA_HOME} /bin/start-impalad .sh -use_statestore= false
${IMPALA_HOME} /bin/impala-shell .sh

     问题1:

        虽然本地集群的hive metastore已经配置好了,执行impala-shell.sh脚本后也能成功,但是执行show databases的时候,却看不到在hive里已经创建的test.db这个数据库,而脚本还在其执行目录生成了derby.log和metastore.db两个日志文件和目录,这是impala自带的hive元数据库,所以问题就很清楚了,这是因为impala未能了解你所配置的hive元数据。

        按照官网上所说的,为了配置impala需要使用的hdfs,hbase,hive的metastore,其内部实现是将其配置文件放入fe/src/test/resources目录下,这个是在${IMPALA_HOME}/bin/set-classpath.sh中设置的,set-classpath.sh中的shell脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/sh
# Copyright 2012 Cloudera Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
 
# This script explicitly sets the CLASSPATH for embedded JVMs (e.g. in
# Impalad or in runquery) Because embedded JVMs do not honour
# CLASSPATH wildcard expansion, we have to add every dependency jar
# explicitly to the CLASSPATH.
 
CLASSPATH=\
$IMPALA_HOME /fe/src/test/resources :\
$IMPALA_HOME /fe/target/classes :\
$IMPALA_HOME /fe/target/dependency :\
$IMPALA_HOME /fe/target/test-classes :\
${HIVE_HOME} /lib/datanucleus-core-2 .0.3.jar:\
${HIVE_HOME} /lib/datanucleus-enhancer-2 .0.3.jar:\
${HIVE_HOME} /lib/datanucleus-rdbms-2 .0.3.jar:\
${HIVE_HOME} /lib/datanucleus-connectionpool-2 .0.3.jar:${CLASSPATH}
 
 
for jar in ` ls ${IMPALA_HOME} /fe/target/dependency/ *.jar`; do
   CLASSPATH=${CLASSPATH}:$jar
done
 
export CLASSPATH

        但是问题出现了,我所编译成功后的源码fe/src目录下,并没有resources这个目录,所以我从其它地方将它下了下来,然后放到相应目录中,修改相应的core.site.xml,hdfs-site.xml,hive-site.xml这三个配置文件,和集群配置相同;然后执行source bin/set-classpath.sh,这样第一个问题就解决了!

    问题2:

        待到前面的那个问题解决之后,执行与前面相同的两个脚本,执行下面的命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
 
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
 
(Build version: build version not available)
[Not connected] > connect hadoop-01
[hadoop-01:21000] > show databases;
default
test_impala
[hadoop-01:21000] > use test_impala;
 
[hadoop-01:21000] > show tables;
tab1
tab2
tab3
[hadoop-01:21000] > select * from tab3;
 
[hadoop-01:21000] > select * from tab1;
ERROR: Failed to open HDFS file hdfs: //hadoop-01 .localdomain:8030 /user/impala/warehouse/test_impala .db /tab1/tab1 .csv
Error(255): Unknown error 255
ERROR: Invalid query handle
[hadoop-01:21000] > select * from tab1;
ERROR: Failed to open HDFS file hdfs: //hadoop-01 .localdomain:8030 /user/impala/warehouse/test_impala .db /tab1/tab1 .csv
Error(255): Unknown error 255
ERROR: Invalid query handle
[hadoop-01:21000] > quit

        后台impalad的日志信息如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
13 /01/18 11:50:46 INFO service.Frontend: createExecRequest for query select * from tab1
13 /01/18 11:50:46 INFO service.JniFrontend: Plan Fragment 0
   UNPARTITIONED
   EXCHANGE (1)
     TUPLE IDS: 0
 
Plan Fragment 1
   RANDOM
   STREAM DATA SINK
     EXCHANGE ID: 1
     UNPARTITIONED
 
   SCAN HDFS table=test_impala.tab1 (0)
     TUPLE IDS: 0
 
13 /01/18 11:50:46 INFO service.JniFrontend: returned TQueryExecRequest2: TExecRequest(stmt_type:QUERY, sql_stmt: select * from tab1, request_id:TUniqueId(hi:-6897121767931491435, lo:-4792011001236606993), query_options:TQueryOptions(abort_on_error: false , max_errors:0, disable_codegen: false , batch_size:0, return_as_ascii: true , num_nodes:0, max_scan_range_length:0, num_scanner_threads:0, max_io_buffers:0, allow_unsupported_formats: false , partition_agg: false ), query_exec_request:TQueryExecRequest(desc_tbl:TDescriptorTable(slotDescriptors:[TSlotDescriptor( id :0, parent:0, slotType:INT, columnPos:0, byteOffset:4, nullIndicatorByte:0, nullIndicatorBit:1, slotIdx:1, isMaterialized: true ), TSlotDescriptor( id :1, parent:0, slotType:BOOLEAN, columnPos:1, byteOffset:1, nullIndicatorByte:0, nullIndicatorBit:0, slotIdx:0, isMaterialized: true ), TSlotDescriptor( id :2, parent:0, slotType:DOUBLE, columnPos:2, byteOffset:8, nullIndicatorByte:0, nullIndicatorBit:2, slotIdx:2, isMaterialized: true ), TSlotDescriptor( id :3, parent:0, slotType:TIMESTAMP, columnPos:3, byteOffset:16, nullIndicatorByte:0, nullIndicatorBit:3, slotIdx:3, isMaterialized: true )], tupleDescriptors:[TTupleDescriptor( id :0, byteSize:32, numNullBytes:1, tableId:1)], tableDescriptors:[TTableDescriptor( id :1, tableType:HDFS_TABLE, numCols:4, numClusteringCols:0, hdfsTable:THdfsTable(hdfsBaseDir:hdfs: //hadoop-01 .localdomain:8030 /user/impala/warehouse/test_impala .db /tab1 , partitionKeyNames:[], nullPartitionKeyValue:__HIVE_DEFAULT_PARTITION__, partitions:{-1=THdfsPartition(lineDelim:10, fieldDelim:44, collectionDelim:44, mapKeyDelim:44, escapeChar:0, fileFormat:TEXT, partitionKeyExprs:[], blockSize:0, compression:NONE), 1=THdfsPartition(lineDelim:10, fieldDelim:44, collectionDelim:44, mapKeyDelim:44, escapeChar:0, fileFormat:TEXT, partitionKeyExprs:[], blockSize:0, compression:NONE)}), tableName:tab1, dbName:test_impala)]), fragments:[TPlanFragment(plan:TPlan(nodes:[TPlanNode(node_id:1, node_type:EXCHANGE_NODE, num_children:0, limit:-1, row_tuples:[0], nullable_tuples:[ false ], compact_data: false )]), output_exprs:[TExpr(nodes:[TExprNode(node_type:SLOT_REF, type :INT, num_children:0, slot_ref:TSlotRef(slot_id:0))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF, type :BOOLEAN, num_children:0, slot_ref:TSlotRef(slot_id:1))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF, type :DOUBLE, num_children:0, slot_ref:TSlotRef(slot_id:2))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF, type :TIMESTAMP, num_children:0, slot_ref:TSlotRef(slot_id:3))])], partition:TDataPartition( type :UNPARTITIONED, partitioning_exprs:[])), TPlanFragment(plan:TPlan(nodes:[TPlanNode(node_id:0, node_type:HDFS_SCAN_NODE, num_children:0, limit:-1, row_tuples:[0], nullable_tuples:[ false ], compact_data: false , hdfs_scan_node:THdfsScanNode(tuple_id:0))]), output_sink:TDataSink( type :DATA_STREAM_SINK, stream_sink:TDataStreamSink(dest_node_id:1, output_partition:TDataPartition( type :UNPARTITIONED, partitioning_exprs:[]))), partition:TDataPartition( type :RANDOM, partitioning_exprs:[]))], dest_fragment_idx:[0], per_node_scan_ranges:{0=[TScanRangeLocations(scan_range:TScanRange(hdfs_file_split:THdfsFileSplit(path:hdfs: //hadoop-01 .localdomain:8030 /user/impala/warehouse/test_impala .db /tab1/tab1 .csv, offset:0, length:192, partition_id:1)), locations:[TScanRangeLocation(server:THostPort( hostname :192.168.1.2, ipaddress:192.168.1.2, port:50010), volume_id:0)])]}, query_globals:TQueryGlobals(now_string:2013-01-18 11:50:46.000000862)), result_set_metadata:TResultSetMetadata(columnDescs:[TColumnDesc(columnName: id , columnType:INT), TColumnDesc(columnName:col_1, columnType:BOOLEAN), TColumnDesc(columnName:col_2, columnType:DOUBLE), TColumnDesc(columnName:col_3, columnType:TIMESTAMP)]))
hdfsOpenFile(hdfs: //hadoop-01 .localdomain:8030 /user/impala/warehouse/test_impala .db /tab1/tab1 .csv): FileSystem #open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error:
java.lang.IllegalArgumentException: Wrong FS: hdfs: //hadoop-01 .localdomain:8030 /user/impala/warehouse/test_impala .db /tab1/tab1 .csv, expected: hdfs: //localhost :20500
     at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:547)
     at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:169)
     at org.apache.hadoop.hdfs.DistributedFileSystem. open (DistributedFileSystem.java:245)
     at org.apache.hadoop.hdfs.DistributedFileSystem. open (DistributedFileSyst

         问题所指的是Wrong FS错误,expected:hdfs://localhost:20500,我在resources目录下的core-site.xml配置文件明明就已经指定了namenode的地址和端口为8030,后来看了下impala关于impala的源码,才发现在/ be / src / runtime / hdfs-fs-cache.cc目录下,有指定默认的nn和nn_port,

1
2
DEFINE_string(nn, "localhost" , "hostname or ip address of HDFS namenode" );
DEFINE_int32(nn_port, 20500, "namenode port" );

         所以,在启动impalad的服务的时候,需要同时指定nn和nn_port为集群所设置的相应地址和端口,如下所示:

1
/bin/start-impalad .sh -use_statestore= false -nn=hadoop-01.localdomain -nn_port=8030

         这样关于expected: hdfs://localhost:20500第二个问题也就解决了,执行任何查询都没有问题!

转载于:https://www.cnblogs.com/Loogn-qiang/archive/2013/01/18/2866208.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值