oracle 外部表时间戳,Hive建立外部表与时间戳转换（含建dual表，修改列名，row_number() 函数等）...

最新推荐文章于 2021-11-30 18:53:18 发布

ONE实验室

最新推荐文章于 2021-11-30 18:53:18 发布

阅读量179

点赞数

文章标签： oracle 外部表时间戳

建外部表，之前hadoop职位统计就是这么做的

hive> drop table job;

Time taken: 5.446 seconds

hive> show tables;

Time taken: 0.018 seconds

hive> create external table job(area string, experience string, degree string,

> num string, salary string)

> row format delimited fields terminated by ','

> location '/job';#在hdfs先建好这个文件夹，并上传好数据

Time taken: 0.103 seconds

hive> select * from job;

北京3-5年本科3人 15001-20000元/月

北京1-3年本科3人 10001-15000元/月

杭州3-5年本科1人 15001-20000元/月

。。。

hive> select area, count(area) from job group by area;

建表

hive> create external table tctest(uid string,goodsid string,behtype string,space string,category string,time string)

hive> row format delimited fields terminated by ','

hive> location '/tianchitest';

将日期转化为时间戳

hive> select unix_timestamp(time,'yyyyMMddHH') from tctest;利用上面的函数+CTAS语句创建一个含时间戳的表

hive> create table tcc as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') from tctest;

最好直接改列名

hive> create table tcc as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') as time from tctest;

dual表

因为Hive里没有自带dual表，所以我们要自己建一个

hive> create table dual(dummy string);#之后建一个dual.txt里面写个值X

hive> load data local inpath '/home/guo/dual.txt' into table dual;

hive> select unix_timestamp("2014-12-18 23:59:59") from dual;

1418918399

Time taken: 0.082 seconds, Fetched: 1 row(s)

hive> select unix_timestamp("2014-12-17 23:59:59") from dual;

1418831999

Time taken: 0.112 seconds, Fetched: 1 row(s)

将时间戳转化为日期

hive> select from_unixtime(1418831999) from dual;

2014-12-17 23:59:59

Time taken: 0.111 seconds, Fetched: 1 row(s)

查看表结构

hive> describe tcc;

uid string

goodsid string

behtype string

space string

category string

_c5 bigint

Time taken: 0.025 seconds, Fetched: 6 row(s)修改列名，注意那是反引号

hive> alter table tcc change `_c5` time bigint;

Time taken: 0.172 seconds

将hive表中内容导到本地

guo@guo:~$ hive -e "select * from tcpredict" >> predict.csv

开始的想法奉上(当时没有理解题意，想的也比较简单)

hive> create external table tclx(uid string,goodsid string,behtype string,space string,category string,time string)

hive> row format delimited fields terminated by ','

hive> location '/tianchilx';

hive> create table tianchi as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') as time from tclx;

#改列名，不用了

hive> alter table tianchi change `_c5` time bigint;

0.0

hive> create table tct1 as select distinct uid from tianchi where behtype = 3 and time > 1418831999;

hive> create table tct2 as select t.* from tianchi t,tct1 c where t.uid=c.uid;

hive> select * from tct2 limit 5;#查看表的前五行数据

hive> create table tct3 as select distinct uid from tct2 where behtype=4;

hive> create table tct4 as select t.* from tct2 t,tct3 c where t.uid=c.uid;

改进一下

hive> create table tct6 as select c.* from (select distinct uid from tct2 where behtype=4)t,tct2 c where t.uid=c.uid;

或者，更直接一点

hive> create table tct7 as select c.* from

(select distinct uid from (select a.* from tianchi a,(select distinct uid from tianchi

where behtype=3 and time>1418831999) b

where a.uid=b.uid)t

where t.behtype=4)t,tct2 c

where t.uid=c.uid

或者，改进一点，用in

hive> create table tct8 as select c.* from

(select distinct uid from (select a.* from tianchi a where a.uid in (select distinct uid from tianchi

where behtype=3 and time>1418831999))b

where b.behtype=4)t,tct2 c

where t.uid=c.uid;再改进一点

hive> create table tct9 as select c.* from tct2 c

where c.uid in (select distinct uid from (select a.* from tianchi a where a.uid in (select distinct uid from tianchi

where behtype=3 and time>1418831999))b

where b.behtype=4);

下面来自：http://jingyan.baidu.com/article/9989c74604a644f648ecfef3.html

简单的说row_number()从1开始，为每一条分组记录返回一个数字，这里的ROW_NUMBER() OVER (ORDER BY xlh DESC) 是先把xlh列降序，再为降序以后的没条xlh记录返回一个序号。

SELECT *, Row_Number() OVER (partition by deptid ORDER BY salary desc) rank FROM employee

根据部门分组，显示每个部门的工资等级

ONE实验室

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
oracle 外部表时间戳,Hive建立外部表与时间戳转换（含建dual表，修改列名，row_number() 函数等）...

建外部表，之前hadoop职位统计就是这么做的hive> drop table job;OKTime taken: 5.446 secondshive> show tables;OKTime taken: 0.018 secondshive> create external table job(area string, experience string, degree stri...
复制链接

扫一扫