建外部表,之前hadoop职位统计就是这么做的
hive> drop table job;
OK
Time taken: 5.446 seconds
hive> show tables;
OK
Time taken: 0.018 seconds
hive> create external table job(area string, experience string, degree string,
> num string, salary string)
> row format delimited fields terminated by ','
> location '/job';#在hdfs先建好这个文件夹,并上传好数据
OK
Time taken: 0.103 seconds
hive> select * from job;
OK
北京3-5年本科3人 15001-20000元/月
北京1-3年本科3人 10001-15000元/月
杭州3-5年本科1人 15001-20000元/月
。。。
hive> select area, count(area) from job group by area;
建表
hive> create external table tctest(uid string,goodsid string,behtype string,space string,category string,time string)
hive> row format delimited fields terminated by ','
hive> location '/tianchitest';
将日期转化为时间戳
hive> select unix_timestamp(time,'yyyyMMddHH') from tctest;利用上面的函数+CTAS语句创建一个含时间戳的表
hive> create table tcc as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') from tctest;
最好直接改列名
hive> create table tcc as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') as time from tctest;
dual表
因为Hive里没有自带dual表,所以我们要自己建一个
hive> create table dual(dummy string);#之后建一个dual.txt里面写个值X
hive> load data local inpath '/home/guo/dual.txt' into table dual;
hive> select unix_timestamp("2014-12-18 23:59:59") from dual;
OK
1418918399
Time taken: 0.082 seconds, Fetched: 1 row(s)
hive> select unix_timestamp("2014-12-17 23:59:59") from dual;
OK
1418831999
Time taken: 0.112 seconds, Fetched: 1 row(s)
将时间戳转化为日期
hive> select from_unixtime(1418831999) from dual;
OK
2014-12-17 23:59:59
Time taken: 0.111 seconds, Fetched: 1 row(s)
查看表结构
hive> describe tcc;
OK
uid string
goodsid string
behtype string
space string
category string
_c5 bigint
Time taken: 0.025 seconds, Fetched: 6 row(s)修改列名,注意那是反引号
hive> alter table tcc change `_c5` time bigint;
OK
Time taken: 0.172 seconds
将hive表中内容导到本地
guo@guo:~$ hive -e "select * from tcpredict" >> predict.csv