Hive基础入门-架构
Hive在Hadoop中扮演数据仓库的角色。Hive添加数据的结构在HDFS(hive superimposes structure on data in HDFS),并允许使用类似于SQL语法进行数据查询
ive更适合于数据仓库的任务,Hive主要用于静态的结构以及需要经常分析的工作。Hive与SQL相似促使其成为Hadoop与其他BI工具结合的理想交集
总体架构图如下:
细分如下:
1:Hive的接口
A: HWI
./hive --service hwi
[root@pg2 bin]# ./hive --service hwi
13/01/06 23:56:38 INFO hwi.HWIServer: HWI is starting up
13/01/06 23:56:38 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /hive/hive-0.9.0/conf/hive-default.xml
13/01/06 23:56:38 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
13/01/06 23:56:39 FATAL hwi.HWIServer: HWI WAR file not found at /hive/hive-0.9.0/hive/hive-0.9.0/lib/hive-hwi-0.9.0.war
[root@pg2 bin]#
看来配置存在问题
The war path should be /hive/hive-0.9.0/lib/hive-hwi-0.9.0.war
Update hive-default.xml
copy hive-default.xml as hive-site.xml
root@pg2 bin]# ./hive --service hwi
13/01/07 23:18:43 INFO hwi.HWIServer: HWI is starting up
13/01/07 23:18:43 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /hive/hive-0.9.0/conf/hive-default.xml
13/01/07 23:18:43 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
13/01/07 23:18:44 INFO mortbay.log: jetty-6.1.26
13/01/07 23:18:44 INFO mortbay.log: Extract /hive/hive-0.9.0/lib/hive-hwi-0.9.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.0.9.0.war__hwi__2l99ri/webapp
13/01/07 23:18:45 INFO mortbay.log: Started SocketConnector@0.0.0.0:9999
B:Client
./hive --service hiveservice 10000 >/dev/null 2>dev/null
C:CLI [命令行]
./hive启动
./hive --service cli
D:JDBC连接方式
连接Hive JDBC URL:jdbc:hive://192.168.6.116:10000/default (Hive默认端口:10000 默认数据库名:default)
a:hive首先要起动远程服务接口,命令:
./hive –service hiveserver &
[root@pg2 bin]# ./hive --service hiveserver
Starting Hive Thrift Server
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
b:java工程中导入相应的需求jar包,列表如下:
antlr-runtime-3.0.1.jar
hive-exec-0.9.0.jar
hive-jdbc-0.9.0.jar
hive-metastore-0.9.0.jar
hive-service-0.9.0.jar
jdo2-api-2.3-ec.jar
libfb303.jar
commons-logging-1.0.4.jar
hadoop-core-1.0.4.jar
slf4j-api-1.6.1.jar
实例:
package org.dw.hive.test;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
/**
* @author root
*
*/
public class TestHive {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
testHiveByJDBC();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void testHiveByJDBC() throws ClassNotFoundException, SQLException{
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
String dropSQL="drop table mybloger";
String createSQL="create table mybloger (key int, value string)";
String insterSQL="LOAD DATA LOCAL INPATH '/tmp/hive/test_data_myblog.txt' OVERWRITE INTO TABLE mybloger";
String querySQL="SELECT a.* FROM mybloger a";
Connection con = DriverManager.getConnection("jdbc:hive://192.168.0.101:10000/default", "", "");
Statement stmt = con.createStatement();
stmt.executeQuery(dropSQL); // 执行删除语句
stmt.executeQuery(createSQL); // 执行建表语句
stmt.executeQuery(insterSQL); // 执行插入语句
ResultSet res = stmt.executeQuery(querySQL); // 执行查询语句
while (res.next()) {
System.out.println("Result: key:"+res.getString(1) +" –> value:" +res.getString(2));
}
}
}
hive server run log as following:
OK
OK
Copying data from file:/tmp/hive/test_data_myblog.txt
Copying file: file:/tmp/hive/test_data_myblog.txt
Loading data to table default.mybloger
Deleted hdfs://localhost:9000/user/hive/warehouse/mybloger
OK
OK
but search by CLI
[root@pg2 bin]# ./hive
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in file:/hive/hive-0.9.0/conf/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201301072357_1121120586.txt
hive> show tables;
FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database 'metastore_db', see the next exception for details.
NestedThrowables:
java.sql.SQLException: Failed to start database 'metastore_db', see the next exception for details.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
hive> select * from mybloger;
FAILED: Error in semantic analysis: Unable to fetch table mybloger
hive>
2:Hive数据模型
A:Table
对应具体的Table存储数据文件
B:Partition
对应表的Partition列的密集索引
C:External Table
指向HDFS的数据文件,可以创建分区
D:Buckets
对指定列的计算hash,根据hash对数据进行切分,目的为了并行
转载:http://hi.baidu.com/huareal/item/26bdf380ea944dc6b17154d6