hive教程
备注:hive本身是不存储数据的,它可以把Mysql、Hdfs、Hbase等当然数据源,然后做数据处理,表面上,Hive用的是SQL,但是Hive的底层是MapReduce,它只是一个把SQL转变成MapReduce的工具,当然,有些没有查询条件的查询它只是做了数据的映射(并不执行MapReduce),这也是它做的优化。
1、建表
默认是内部表
create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';
建分区表
create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t';
建外部表
create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';
2、普通表和分区表区别:有大量数据增加的需要建分区表
create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';
3、内部表和外部表的区别
a、内部表一般是原来没有数据,需要自己加载数据,而且删除表时,数据会被一起删除
b、外部表一般是已经有了业务数据的,而且存在别的系统已经在使用这些数据,删除表时,数据不会被删除
4、加载数据
a、普通表数据加载
load data local inpath '/home/hadoop/ip.txt' into table tab_ext;
b、分区表数据加载
load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');
load data local inpath '/root/data.am' into table beauty partition (nation="USA");
5、查询数据
a、分区表查询所有数据
select * from tab_ip_part;
b、分区表查询分区为part2的数据
select * from tab_ip_part where part_flag='part2';
6、更新表
alter table tab_ip change id id_alter string;
alter table tab_cts add partition (partCol = 'dt') location '/external/hive/dt';
show partitions tab_ip_part;
7、数组类型(array)
create table tab_array(a array<int>,b array<string>)
row format delimited
fields terminated by '\t'
collection items terminated by ',';
示例数据
tobenbrone,laihama,woshishui 13866987898,13287654321
abc,iloveyou,itcast 13866987898,13287654321
select a[0] from tab_array;
select * from tab_array where array_contains(b,'word');
insert into table tab_array select array(0),array(name,ip) from tab_ext t;
8、map类型
create table tab_map(name string,info map<string,string>)
row format delimited
fields terminated by '\t'
collection items terminated by ';'
map keys terminated by ':';
示例数据:
fengjie age:18;size:36A;addr:usa
furong age:28;size:39C;addr:beijing;weight:180KG
9、insert from select 用于向临时表中追加中间结果数据
create table tab_ip_like like tab_ip;
insert overwrite table tab_ip_like select * from tab_ip;
10、写入hdfs
insert overwrite local directory '/home/hadoop/hivetemp/test.txt' select * from tab_ip_part where part_flag='part1';
insert overwrite directory '/hiveout.txt' select * from tab_ip_part where part_flag='part1';
11、CLUSTER(桶,均衡的把数据插入文件中),相对高级一点,你可以放在有精力的时候才去学习,用于做抽样分析等
create table tab_ip_cluster(id int,name string,ip string,country string) clustered by(id) into 3 buckets;
load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_cluster;
set hive.enforce.bucketing=true;
insert into table tab_ip_cluster select * from tab_ip;
select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id);
12、自定义函数,UDF
a、java类要继承org.apache.hadoop.hive.ql.exec.UDF类实现evaluate
b、添加jar包(把jar包打包好上传到hive的机子上,在hive命令行里面执行)
hive> add jar /root/NUDF.jar;
c、创建临时函数
hive> create temporary function getNation as 'cn.itcast.hive.udf.NationUDF';
d、调用
hive> select id, name, getNation(nation) from beauty;
注意:
1、hdfs的mapreduce如果是用hdfs用户来启动任务的,那么启动hive也要切换到hdfs,不然会出现权限的错误

718

被折叠的 条评论
为什么被折叠?



