Apache Hive翻译①--简介

最新推荐文章于 2024-06-05 18:19:35 发布

srzyhead

最新推荐文章于 2024-06-05 18:19:35 发布

阅读量859

点赞数

分类专栏： hive 文章标签： Apache Hive SQL MapReduce

hive 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Apache Hive

原地址: https://cwiki.apache.org/confluence/display/Hive/Home

The Apache Hive^TM data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop^TM , it provides

hive数据仓库软件帮助查询和管理分布式系统中的大数据集。基于Hadoop,他提供

Tools to enable easy data extract/transform/load (ETL)
允许简单数据提取/变换和载入的工具
A mechanism to impose structure on a variety of data formats
一种机制用来支持多数据形式
Access to files stored either directly in Apache HDFS^TM or in other data storage systems such as Apache HBase^TM
入口可以是文件存储或者直接使用HDFS或者使用其他存储系统,如HBASE
Query execution via MapReduce
通过MapReduce执行查询

Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. QL can also be extended with custom scalar functions (UDF's), aggregations (UDAF's), and table functions (UDTF's).

Hive定义一个简单的类SQL查询语言,叫QL,这允许用户使用熟悉的sql来查询数据。同时,这个语言也允许书序MapReduce框架的程序员插入他们自定义的mapper和reducer来进行更加复杂的内建功能不支持的分析。QL也可以扩展自定义标量函数(PS:标量函数就是接受0到多个参数,返回一个标量值作为结果的函数),聚合函数,和表函数。

关于表函数:

表函数时sql:2003新加入的。表函数是一个返回表的sql调用函数,标准的规范定义是,返回类型是一个多行的mulitset(允许重复元素的集合),虽然不是一个真正的表,但是可以像表那样查询。

表函数例子:

CREATE FUNCTION weather()

RETURNS TABLE (

CITY VARCHAR(25),

TEMP_IN_F INTEGER,

HUMIDITY INTEGER,

WIND VARCHAR(5),

FORECAST CHAR(25) )

NOT DETERMINISTIC

NO SQL

LANGUAGE C

EXTERNAL

PARAMETER STYLE SQL;

oracle关于表函数的文档:

http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dcitblfns.htm

Hive does not mandate read or written data be in the "Hive format"---there is no such thing. Hive works equally well on Thrift, control delimited, or your specialized data formats. Please see File Format and SerDe in the Developer Guide for details.

Hive不授予数据的读写权限。Hive在thrift,分割控制, control delimited, 或自定义数据类型上表现很好。请参见 File Format and SerDe in the Developer Guide获取详细信息。

Thrift一个跨语言开发的软件框架

http://thrift.apache.org/

Hive is not designed for OLTP workloads and does not offer real-time queries or row-level updates. It is best used for batch jobs over large sets of append-only data (like web logs). What Hive values most are scalability (scale out with more machines added dynamically to the Hadoop cluster), extensibility (with MapReduce framework and UDF/UDAF/UDTF), fault-tolerance, and loose-coupling with its input formats.

Hive并不是给OLTP做的,不提供实时查询,或者行级更新。他适合的是大数据集追加式数据(如web日志)的批量处理。Hive意味着更高的可扩展(通过动态向Hadoop集群添加机器进行扩展),更高的延展性(MapReduce框架和UDF/UDAF/UDTF),更高容错率,注重实时性和输入格式的低耦合。

On-Line Transaction Processing联机事务处理系统(OLTP)

Getting Started
Presentations and Papers about Hive
A List of Sites and Applications Powered by Hive
FAQ
hive-users Mailing List
Hive IRC Channel: #hive on irc.freenode.net
About This Wiki

User Documentation

Administrator Documentation

Resources for Contributors

For more information, please see the official Hive website.

Apache Hive, Apache Hadoop, Apache HBase, Apache HDFS, Apache, the Apache feather logo, and the Apache Hive project logo are trademarks of The Apache Software Foundation.