hive学习笔记以及一些linux命令

 
去双引号:
sed -i "s/"//g" textName
working:
perl -p -i -e "s/ /,/g" ./wuhan_feiy_end_result.csv
出现 分组无效情况
select patient_sn,year,sex,cast(year as int)-cast(birth_date as int) from tableName  limit 10;
select trim(patient_sn) as patient_sn,trim(visit_type) as visit_type,trim(year) as year,trim(sex) as sex,trim(age) as age,sum(fee_sum) from tableName  group by
trim(visit_type) ,trim(patient_sn), trim(year),trim(sex),trim(age) order by patient_sn,year,sex,age,visit_type ;
--结论:可能是 分组数据存在空格导致分组无效,使用trim()函数解决

建表语句, 加载csv文件:
create table tableName (JZJLH string,SDYWH string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
 WITH SERDEPROPERTIES (
 "separatorChar" = ",", 
 "quoteChar" = "\"", 
 "escapeChar" = "\\" 
 ) 
--建表语句模板:
create table tableName(JZJLH string,SDYWH string)
row format delimited
fields terminated by '","'
lines terminated by '\n'
stored as textfile;
加载数据到表中:
OVERWRITE INTO TABLE tableName;
load本地新文件到hive:utf-8
load data local INPATH path
OVERWRITE INTO TABLE tableName;
查询结果只 生成一个文件:(部分字段)导出到本地
hive -e "SELECT concat(jzjlh,',',fldmmc,',',zje) FROM zhangheng.tableName;">/opt/textName
查询到数据插入本地文件
INSERT OVERWRITE LOCAL DIRECTORY '/opt/zhangheng'
row format delimited fields terminated by ","
SELECT jzjlh, fyrq, fldm,sum(cast( zje as double)),fldmmc FROM 
tableName group by jzjlh, fyrq, fldm ,fldmmc ORDER BY  jzjlh desc,  fldm desc,  fyrq desc ;
查询结果插入hive表:
insert overwrite table tableName
SELECT jzjlh, fyrq, fldm,sum(zje),fldmmc,count(zje) FROM 
tableName group by jzjlh, fyrq, fldm ,fldmmc ORDER BY  jzjlh desc,  fldm desc,  fyrq desc ;
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值