hive简介
hive 是一个大数据仓库分析工具,它可以使用类似sql语句的方式操作集群上的数据文件。
转述一段官网的描述:
The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.
Built on top of Apache Hadoop™, Hive provides the following features:
Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
A mechanism to impose structure on a variety of data formats
Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
Query execution via Apache Tez™, Apache Spark™, or MapReduce
Procedural language with HPL-SQL
Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider
hive官网wiki
https://cwiki.apache.org/confluence/display/Hive/Home
hive架构、工作原理
http://blog.csdn.net/u010330043/article/details/51225021
hive sql语句的解析执行过程
http://blog.csdn.net/jojo52013145/article/details/19206559
hive sql常见语句
1、创建表
内表
create table aa(col1 string,col2 int) partitioned by(statdate int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
外表
create external table bb(col1 string, col2 int) partitioned by(statdate int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/user/gaofei.lu/';
2、查看表
show create table aa;
3、导入表数据
本地数据:load data local inpath ' /home/gaofei.lu/aa.txt' into table aa partition(statdate=20170403)
hdfs上数据:load data inpath '/user/gaofei.lu/aa.txt' into table bb partition(statdate=20170403)