cloudera impala初用之问题集锦（一）

最新推荐文章于 2019-03-15 07:32:51 发布

weixin_30522183

最新推荐文章于 2019-03-15 07:32:51 发布

阅读量500

点赞数

文章标签： java shell 数据库

原文链接：http://www.cnblogs.com/Loogn-qiang/archive/2013/01/18/2866208.html

版权

在按照https://github.com/cloudera/impala所给出的文档进行impala的源码编译之后，在运行下面的脚本之后，出现了一系列的问题：

 
         ${IMPALA_HOME} 
         /bin/start-impalad 
         .sh -use_statestore= 
         false 
        
         ${IMPALA_HOME} 
         /bin/impala-shell 
         .sh

问题1：

虽然本地集群的hive metastore已经配置好了，执行impala-shell.sh脚本后也能成功，但是执行show databases的时候，却看不到在hive里已经创建的test.db这个数据库，而脚本还在其执行目录生成了derby.log和metastore.db两个日志文件和目录，这是impala自带的hive元数据库，所以问题就很清楚了，这是因为impala未能了解你所配置的hive元数据。

按照官网上所说的，为了配置impala需要使用的hdfs，hbase,hive的metastore，其内部实现是将其配置文件放入fe/src/test/resources目录下,这个是在${IMPALA_HOME}/bin/set-classpath.sh中设置的，set-classpath.sh中的shell脚本如下：

 
         #!/bin/sh 
        
         # Copyright 2012 Cloudera Inc. 
        
         # 
        
         # Licensed under the Apache License, Version 2.0 (the "License"); 
        
         # you may not use this file except in compliance with the License. 
        
         # You may obtain a copy of the License at 
        
         # 
        
         # http://www.apache.org/licenses/LICENSE-2.0 
        
         # 
        
         # Unless required by applicable law or agreed to in writing, software 
        
         # distributed under the License is distributed on an "AS IS" BASIS, 
        
         # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
        
         # See the License for the specific language governing permissions and 
        
         # limitations under the License. 
        
         # This script explicitly sets the CLASSPATH for embedded JVMs (e.g. in 
        
         # Impalad or in runquery) Because embedded JVMs do not honour 
        
         # CLASSPATH wildcard expansion, we have to add every dependency jar 
        
         # explicitly to the CLASSPATH. 
        
         CLASSPATH=\ 
        
         $IMPALA_HOME 
         /fe/src/test/resources 
         :\ 
        
         $IMPALA_HOME 
         /fe/target/classes 
         :\ 
        
         $IMPALA_HOME 
         /fe/target/dependency 
         :\ 
        
         $IMPALA_HOME 
         /fe/target/test-classes 
         :\ 
        
         ${HIVE_HOME} 
         /lib/datanucleus-core-2 
         .0.3.jar:\ 
        
         ${HIVE_HOME} 
         /lib/datanucleus-enhancer-2 
         .0.3.jar:\ 
        
         ${HIVE_HOME} 
         /lib/datanucleus-rdbms-2 
         .0.3.jar:\ 
        
         ${HIVE_HOME} 
         /lib/datanucleus-connectionpool-2 
         .0.3.jar:${CLASSPATH} 
        
         for 
         jar  
         in 
         ` 
         ls 
         ${IMPALA_HOME} 
         /fe/target/dependency/ 
         *.jar`;  
         do 
        
         CLASSPATH=${CLASSPATH}:$jar 
        
         done 
        
         export 
         CLASSPATH

但是问题出现了，我所编译成功后的源码fe/src目录下，并没有resources这个目录，所以我从其它地方将它下了下来，然后放到相应目录中，修改相应的core.site.xml,hdfs-site.xml,hive-site.xml这三个配置文件，和集群配置相同；然后执行source bin/set-classpath.sh，这样第一个问题就解决了！

问题2：

待到前面的那个问题解决之后，执行与前面相同的两个脚本，执行下面的命令：

 
         Welcome to the Impala shell. Press TAB twice to see a list of available commands. 
        
         Copyright (c) 2012 Cloudera, Inc. All rights reserved. 
        
         (Build version: build version not available) 
        
         [Not connected] > connect hadoop-01 
        
         [hadoop-01:21000] > show databases; 
        
         default 
        
         test_impala 
        
         [hadoop-01:21000] > use test_impala; 
        
         [hadoop-01:21000] > show tables; 
        
         tab1 
        
         tab2 
        
         tab3 
        
         [hadoop-01:21000] >  
         select 
         * from tab3; 
        
         [hadoop-01:21000] >  
         select 
         * from tab1; 
        
         ERROR: Failed to  
         open 
         HDFS  
         file 
         hdfs: 
         //hadoop-01 
         .localdomain:8030 
         /user/impala/warehouse/test_impala 
         .db 
         /tab1/tab1 
         .csv 
        
         Error(255): Unknown error 255 
        
         ERROR: Invalid query handle 
        
         [hadoop-01:21000] >  
         select 
         * from tab1; 
        
         ERROR: Failed to  
         open 
         HDFS  
         file 
         hdfs: 
         //hadoop-01 
         .localdomain:8030 
         /user/impala/warehouse/test_impala 
         .db 
         /tab1/tab1 
         .csv 
        
         Error(255): Unknown error 255 
        
         ERROR: Invalid query handle 
        
         [hadoop-01:21000] > quit

后台impalad的日志信息如下：

 
    
         13 
         /01/18 
         11:50:46 INFO service.Frontend: createExecRequest  
         for 
         query  
         select 
         * from tab1 
        
 
         13 
         /01/18 
         11:50:46 INFO service.JniFrontend: Plan Fragment 0 
        
 
            
         UNPARTITIONED 
        
 
            
         EXCHANGE (1) 
        
 
              
         TUPLE IDS: 0  
        

            
        
 
         Plan Fragment 1 
        
 
            
         RANDOM 
        
 
            
         STREAM DATA SINK 
        
 
              
         EXCHANGE ID: 1 
        
 
              
         UNPARTITIONED 
        

            
        
 
            
         SCAN HDFS table=test_impala.tab1 (0) 
        
 
              
         TUPLE IDS: 0  
        

            
        
 
         13 
         /01/18 
         11:50:46 INFO service.JniFrontend: returned TQueryExecRequest2: TExecRequest(stmt_type:QUERY, sql_stmt: 
         select 
         * from tab1, request_id:TUniqueId(hi:-6897121767931491435, lo:-4792011001236606993), query_options:TQueryOptions(abort_on_error: 
         false 
         , max_errors:0, disable_codegen: 
         false 
         , batch_size:0, return_as_ascii: 
         true 
         , num_nodes:0, max_scan_range_length:0, num_scanner_threads:0, max_io_buffers:0, allow_unsupported_formats: 
         false 
         , partition_agg: 
         false 
         ), query_exec_request:TQueryExecRequest(desc_tbl:TDescriptorTable(slotDescriptors:[TSlotDescriptor( 
         id 
         :0, parent:0, slotType:INT, columnPos:0, byteOffset:4, nullIndicatorByte:0, nullIndicatorBit:1, slotIdx:1, isMaterialized: 
         true 
         ), TSlotDescriptor( 
         id 
         :1, parent:0, slotType:BOOLEAN, columnPos:1, byteOffset:1, nullIndicatorByte:0, nullIndicatorBit:0, slotIdx:0, isMaterialized: 
         true 
         ), TSlotDescriptor( 
         id 
         :2, parent:0, slotType:DOUBLE, columnPos:2, byteOffset:8, nullIndicatorByte:0, nullIndicatorBit:2, slotIdx:2, isMaterialized: 
         true 
         ), TSlotDescriptor( 
         id 
         :3, parent:0, slotType:TIMESTAMP, columnPos:3, byteOffset:16, nullIndicatorByte:0, nullIndicatorBit:3, slotIdx:3, isMaterialized: 
         true 
         )], tupleDescriptors:[TTupleDescriptor( 
         id 
         :0, byteSize:32, numNullBytes:1, tableId:1)], tableDescriptors:[TTableDescriptor( 
         id 
         :1, tableType:HDFS_TABLE, numCols:4, numClusteringCols:0, hdfsTable:THdfsTable(hdfsBaseDir:hdfs: 
         //hadoop-01 
         .localdomain:8030 
         /user/impala/warehouse/test_impala 
         .db 
         /tab1 
         , partitionKeyNames:[], nullPartitionKeyValue:__HIVE_DEFAULT_PARTITION__, partitions:{-1=THdfsPartition(lineDelim:10, fieldDelim:44, collectionDelim:44, mapKeyDelim:44, escapeChar:0, fileFormat:TEXT, partitionKeyExprs:[], blockSize:0, compression:NONE), 1=THdfsPartition(lineDelim:10, fieldDelim:44, collectionDelim:44, mapKeyDelim:44, escapeChar:0, fileFormat:TEXT, partitionKeyExprs:[], blockSize:0, compression:NONE)}), tableName:tab1, dbName:test_impala)]), fragments:[TPlanFragment(plan:TPlan(nodes:[TPlanNode(node_id:1, node_type:EXCHANGE_NODE, num_children:0, limit:-1, row_tuples:[0], nullable_tuples:[ 
         false 
         ], compact_data: 
         false 
         )]), output_exprs:[TExpr(nodes:[TExprNode(node_type:SLOT_REF,  
         type 
         :INT, num_children:0, slot_ref:TSlotRef(slot_id:0))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF,  
         type 
         :BOOLEAN, num_children:0, slot_ref:TSlotRef(slot_id:1))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF,  
         type 
         :DOUBLE, num_children:0, slot_ref:TSlotRef(slot_id:2))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF,  
         type 
         :TIMESTAMP, num_children:0, slot_ref:TSlotRef(slot_id:3))])], partition:TDataPartition( 
         type 
         :UNPARTITIONED, partitioning_exprs:[])), TPlanFragment(plan:TPlan(nodes:[TPlanNode(node_id:0, node_type:HDFS_SCAN_NODE, num_children:0, limit:-1, row_tuples:[0], nullable_tuples:[ 
         false 
         ], compact_data: 
         false 
         , hdfs_scan_node:THdfsScanNode(tuple_id:0))]), output_sink:TDataSink( 
         type 
         :DATA_STREAM_SINK, stream_sink:TDataStreamSink(dest_node_id:1, output_partition:TDataPartition( 
         type 
         :UNPARTITIONED, partitioning_exprs:[]))), partition:TDataPartition( 
         type 
         :RANDOM, partitioning_exprs:[]))], dest_fragment_idx:[0], per_node_scan_ranges:{0=[TScanRangeLocations(scan_range:TScanRange(hdfs_file_split:THdfsFileSplit(path:hdfs: 
         //hadoop-01 
         .localdomain:8030 
         /user/impala/warehouse/test_impala 
         .db 
         /tab1/tab1 
         .csv, offset:0, length:192, partition_id:1)), locations:[TScanRangeLocation(server:THostPort( 
         hostname 
         :192.168.1.2, ipaddress:192.168.1.2, port:50010), volume_id:0)])]}, query_globals:TQueryGlobals(now_string:2013-01-18 11:50:46.000000862)), result_set_metadata:TResultSetMetadata(columnDescs:[TColumnDesc(columnName: 
         id 
         , columnType:INT), TColumnDesc(columnName:col_1, columnType:BOOLEAN), TColumnDesc(columnName:col_2, columnType:DOUBLE), TColumnDesc(columnName:col_3, columnType:TIMESTAMP)])) 
        
 
         hdfsOpenFile(hdfs: 
         //hadoop-01 
         .localdomain:8030 
         /user/impala/warehouse/test_impala 
         .db 
         /tab1/tab1 
         .csv): FileSystem 
         #open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error: 
        
 
         java.lang.IllegalArgumentException: Wrong FS: hdfs: 
         //hadoop-01 
         .localdomain:8030 
         /user/impala/warehouse/test_impala 
         .db 
         /tab1/tab1 
         .csv, expected: hdfs: 
         //localhost 
         :20500 
        
 
              
         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:547) 
        
 
              
         at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:169) 
        
 
              
         at org.apache.hadoop.hdfs.DistributedFileSystem. 
         open 
         (DistributedFileSystem.java:245) 
        
 
              
         at org.apache.hadoop.hdfs.DistributedFileSystem. 
         open 
         (DistributedFileSyst 
        
 
  

问题所指的是Wrong FS错误，expected：hdfs://localhost:20500，我在resources目录下的core-site.xml配置文件明明就已经指定了namenode的地址和端口为8030，后来看了下impala关于impala的源码，才发现在/ be / src / runtime / hdfs-fs-cache.cc目录下，有指定默认的nn和nn_port，

 
         DEFINE_string(nn,  
         "localhost" 
         ,  
         "hostname or ip address of HDFS namenode" 
         ); 
        
         DEFINE_int32(nn_port, 20500,  
         "namenode port" 
         );

所以，在启动impalad的服务的时候，需要同时指定nn和nn_port为集群所设置的相应地址和端口，如下所示：

 
         /bin/start-impalad 
         .sh -use_statestore= 
         false 
         -nn=hadoop-01.localdomain -nn_port=8030

这样关于expected: hdfs://localhost:20500第二个问题也就解决了，执行任何查询都没有问题！

转载于:https://www.cnblogs.com/Loogn-qiang/archive/2013/01/18/2866208.html

weixin_30522183

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
cloudera impala初用之问题集锦（一）

在按照https://github.com/cloudera/impala所给出的文档进行impala的源码编译之后，在运行下面的脚本之后，出现了一系列的问题：12${IMPALA_HOME}/bin/start-impalad.sh -use_statestore=false${IMPALA_HOME}/bin/impala-shell.sh...
复制链接

扫一扫