hive
文章平均质量分 79
macyang
Chance is waiting for prepared people and my Status is read the fucking source code.
展开
-
Hive装载数据命令
必须在表定义时创建partitiona、单分区建表语句:create table day_table (id int, content string) partitioned by (dt string);单分区表,按天分区,在表结构中存在id,content,dt三列。以dt为文件夹区分b、 双分区建表语句:create table day_hour_table (id int, c转载 2012-02-14 21:28:31 · 4506 阅读 · 0 评论 -
Using Hadoop’s DistributedCache
While working with Map Reduce applications, there are times when we need to share files globally with all nodes on the cluster. This can be a shared library to be accessed by each task, a global looku转载 2012-04-25 09:59:52 · 1363 阅读 · 0 评论 -
Tenzing A SQL Implemention On The MapReduce Framework(译)
作者:Biswapesh Chattopadhyay&Weiran Liu .etc.Google Inc 2011-8原文:http://www.vldb.org/pvldb/vol4/p1318-chattopadhyay.pdf译者:phylips@bmy 2011-10-6译文: http://duanple.blog.163.com/blog/static/709717672转载 2012-02-05 18:42:36 · 2669 阅读 · 1 评论 -
The Stinger Initiative: Making Apache Hive 100 Times Faster
Introduced by Facebook in 2007, Apache Hive and its HiveQL interface has become the de facto SQL interface for Hadoop. Today, companies of all types and sizes use Hive to access Hadoop data in a fa转载 2013-02-23 22:49:20 · 977 阅读 · 0 评论 -
Optimizing Joins running on HDInsight Hive on Azure at GFS
IntroductionTo analyze hardware utilization within their data centers, Microsoft’s Online Services Division – Global Foundation Services (GFS) is working with Hadoop / Hive via HDInsight on Azure.转载 2013-06-14 17:04:24 · 1092 阅读 · 0 评论 -
Map-side aggregations in Apache Hive
When running large scale Hive reports, one error we occasionally run into is the following:Possible error: Out of memory due to hash maps used in map-side aggregation.Solution: Currently转载 2013-07-06 22:24:47 · 1527 阅读 · 0 评论 -
hash join VS merge join
A "sort merge" join is performed by sorting the two data sets to be joined according to the join keys and then merging them together. The merge is very cheap, but the sort can be prohibitively expensi转载 2014-03-26 13:19:32 · 782 阅读 · 0 评论 -
ORCFile in HDP 2: Better Compression, Better Performance
The upcoming Hive 0.12 is set to bring some great new advancements in the storage layer in the forms of higher compression and better query performance.Higher CompressionORCFile was introduced转载 2014-04-08 11:07:04 · 873 阅读 · 0 评论 -
Hive & Performance 学习笔记
注:本文来源于 Hortonworks 的 Adam Muise 在 July 23 2013 日的 Toronto Hadoop User Group 大会上的一次演讲,本文只是稍作增删、整理,以备忘。原文请见:http://www.slideshare.net/adammuise/2013-jul-23thughivetuningdeepdive1、Hi转载 2014-04-08 13:41:03 · 5426 阅读 · 0 评论 -
DATA TYPES IN HIVE
Hive data types are categorized into two types. They are the primitive and complex data types.The primitive data types include Integers, Boolean, Floating point numbers and strings. The below table转载 2014-04-11 15:15:42 · 757 阅读 · 0 评论 -
Facebook数据仓库揭秘:RCFile高效存储结构
本文介绍了Facebook公司数据分析系统中的RCFile存储结构,该结构集行存储和列存储的优点于一身,在MapReduce环境下的大规模数据分析中扮演重要角色。Facebook曾在2010 ICDE(IEEE International Conference on Data Engineering)会议上介绍了数据仓库Hive。Hive存储海量数据在Hadoop系统中,提供了一套类数据库转载 2012-08-05 21:00:32 · 726 阅读 · 0 评论 -
Column Statistics in Hive
优化无止境,通过列的统计信息来选择最优的执行计划,看看Cloudera的Hive团队是如何做到的, 本文主要从两个方面说的: 动机、统计使用的算法和数据结构Over the last couple of months the Hive team at Cloudera has been working hard to bring a bunch of exciting new featur转载 2012-08-05 15:01:36 · 1229 阅读 · 0 评论 -
Join Optimization in Apache Hive
本文主要介绍facebook如何对hive join做优化,在做一个大表和小表关联的时候MapJoin特别有用,性能提高很多,推荐使用。With more than 500 million users sharing a billion pieces of content daily, Facebook stores a vast amount of data, and needs a s转载 2012-02-16 22:42:51 · 1171 阅读 · 0 评论 -
Hive相关文章
下面是关于Hive的一些文章,欢迎推荐更多的关于Hive的理论、实战好文章!Apache Hivehttps://cwiki.apache.org/confluence/display/Hive/Home-> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL-> https://cwiki.apa原创 2011-11-23 22:56:37 · 1133 阅读 · 0 评论 -
custom map/reduce scripts in hive
First, I have to say that after using Hive for the past couple of weeks and actually writing some real reporting tasks with it, it would be really hard to go back. If you are writing straight hadoop转载 2012-03-16 16:02:44 · 1225 阅读 · 0 评论 -
如何写hive的udf函数?
最近感受了hive的udf函数的强大威力了,不仅可以使用很多已经有的udf函数,还可以自己定义符合业务场景的udf函数,下面就说一下如何写udf/udaf/udtf函数,算是一个入门介绍吧。First, you need to create a new class that extends UDF, with one or more methods named evaluate.pac原创 2012-03-16 23:37:46 · 7738 阅读 · 22 评论 -
Hadoop hive General
What is Hive?Hive is a data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability to querying an原创 2012-04-21 22:18:28 · 709 阅读 · 0 评论 -
hive sql分区表
很不错的常见操作,总结的不错!hive> create table lpx_partition_test(global_id int, company_name string)partitioned by (stat_date string, province string) row format delimited fields terminated by ',';OKTime转载 2012-04-14 15:43:20 · 2730 阅读 · 0 评论 -
Skewed Join Optimization
当join两个大表的时候,对于其中较大的一个表存在少量倾斜很严重的key的时候,可以将这部分key先提取出来(distinct (key))和另外一个表join作为后续map join的小表来用。和下面的思想类似,分而治之。Optimizing Skewed JoinsThe ProblemA join of 2 large data tables is done by a原创 2012-06-17 19:47:16 · 1227 阅读 · 0 评论 -
HIVE 0.14 Cost Based Optimizer (CBO) Technical Overview
Analysts and data scientists⎯not to mention business executives⎯want Big Data not for the sake of the data itself, but for the ability to work with and learn from that data. As other users become more转载 2016-02-16 13:27:16 · 1683 阅读 · 2 评论