Hive基础入门-架构(转载)

最新推荐文章于 2024-12-20 16:01:15 发布

追寻北极

最新推荐文章于 2024-12-20 16:01:15 发布

阅读量1k

点赞数

分类专栏： nosql

nosql 专栏收录该内容

58 篇文章

订阅专栏

Hive基础入门-架构

Hive在Hadoop中扮演数据仓库的角色。Hive添加数据的结构在HDFS（hive superimposes structure on data in HDFS），并允许使用类似于SQL语法进行数据查询

ive更适合于数据仓库的任务，Hive主要用于静态的结构以及需要经常分析的工作。Hive与SQL相似促使其成为Hadoop与其他BI工具结合的理想交集

总体架构图如下：

细分如下：

1：Hive的接口

A: HWI

./hive --service hwi

[root@pg2 bin]# ./hive --service hwi
13/01/06 23:56:38 INFO hwi.HWIServer: HWI is starting up
13/01/06 23:56:38 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /hive/hive-0.9.0/conf/hive-default.xml
13/01/06 23:56:38 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
13/01/06 23:56:39 FATAL hwi.HWIServer: HWI WAR file not found at /hive/hive-0.9.0/hive/hive-0.9.0/lib/hive-hwi-0.9.0.war
[root@pg2 bin]#

看来配置存在问题

The war path should be /hive/hive-0.9.0/lib/hive-hwi-0.9.0.war

Update hive-default.xml

copy hive-default.xml as hive-site.xml

root@pg2 bin]# ./hive --service hwi
13/01/07 23:18:43 INFO hwi.HWIServer: HWI is starting up
13/01/07 23:18:43 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /hive/hive-0.9.0/conf/hive-default.xml
13/01/07 23:18:43 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
13/01/07 23:18:44 INFO mortbay.log: jetty-6.1.26
13/01/07 23:18:44 INFO mortbay.log: Extract /hive/hive-0.9.0/lib/hive-hwi-0.9.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.0.9.0.war__hwi__2l99ri/webapp
13/01/07 23:18:45 INFO mortbay.log: Started SocketConnector@0.0.0.0:9999

B:Client

./hive --service hiveservice 10000 >/dev/null 2>dev/null

C:CLI [命令行]

./hive启动

./hive --service cli

D:JDBC连接方式

连接Hive JDBC URL：jdbc:hive://192.168.6.116:10000/default （Hive默认端口：10000 默认数据库名：default）

a:hive首先要起动远程服务接口，命令：

./hive –service hiveserver &

[root@pg2 bin]# ./hive --service hiveserver
Starting Hive Thrift Server
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.

b:java工程中导入相应的需求jar包，列表如下:

antlr-runtime-3.0.1.jar
hive-exec-0.9.0.jar
hive-jdbc-0.9.0.jar
hive-metastore-0.9.0.jar
hive-service-0.9.0.jar
jdo2-api-2.3-ec.jar
libfb303.jar

commons-logging-1.0.4.jar

hadoop-core-1.0.4.jar

slf4j-api-1.6.1.jar

实例：

package org.dw.hive.test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

/**
* @author root
*
*/
public class TestHive {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        try {
            testHiveByJDBC();
        } catch (ClassNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SQLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    public static void testHiveByJDBC() throws ClassNotFoundException, SQLException{
        Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
        String dropSQL="drop table mybloger";
        String createSQL="create table mybloger (key int, value string)";
        String insterSQL="LOAD DATA LOCAL INPATH '/tmp/hive/test_data_myblog.txt' OVERWRITE INTO TABLE mybloger";
        String querySQL="SELECT a.* FROM mybloger a";

        Connection con = DriverManager.getConnection("jdbc:hive://192.168.0.101:10000/default", "", "");
        Statement stmt = con.createStatement();
        stmt.executeQuery(dropSQL);  // 执行删除语句
        stmt.executeQuery(createSQL);  // 执行建表语句
        stmt.executeQuery(insterSQL);  // 执行插入语句
        ResultSet res = stmt.executeQuery(querySQL);   // 执行查询语句

        while (res.next()) {
            System.out.println("Result: key:"+res.getString(1) +"  –>  value:" +res.getString(2));
        }
    }

}

hive server run log as following:

OK
OK
Copying data from file:/tmp/hive/test_data_myblog.txt
Copying file: file:/tmp/hive/test_data_myblog.txt
Loading data to table default.mybloger
Deleted hdfs://localhost:9000/user/hive/warehouse/mybloger
OK
OK

but search by CLI

[root@pg2 bin]# ./hive
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in file:/hive/hive-0.9.0/conf/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201301072357_1121120586.txt
hive> show tables;
FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database 'metastore_db', see the next exception for details.
NestedThrowables:
java.sql.SQLException: Failed to start database 'metastore_db', see the next exception for details.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
hive> select * from mybloger;
FAILED: Error in semantic analysis: Unable to fetch table mybloger
hive>

2：Hive数据模型

A:Table

对应具体的Table存储数据文件

B:Partition

对应表的Partition列的密集索引

C:External Table

指向HDFS的数据文件，可以创建分区

D:Buckets

对指定列的计算hash,根据hash对数据进行切分，目的为了并行

转载：http://hi.baidu.com/huareal/item/26bdf380ea944dc6b17154d6