hive 如果表不存在则创建_[Hive]MapReduce将数据写入Hive分区表

业务需求:

将当天产生的数据写入Hive分区表中(以日期作为分区)

业务分析:

利用MapReduce将数据写入Hive表实则上就是将数据写入至Hive表的HDFS目录下,但是问题在于写入至当天的分区,因此问题转换为:如何事先创建Hive表的当天分区

解决方案:

1.创建Hive表

# 先创建分区表rcmd_valid_path

hive -e "set mapred.job.queue.name=pms;

drop table if exists pms.test_rcmd_valid_path;

create table if not exists pms.test_rcmd_valid_path

(

track_id string,

track_time string,

session_id string,

gu_id string,

end_user_id string,

page_category_id bigint,

algorithm_id int,

is_add_cart int,

rcmd_product_id bigint,

product_id bigint,

path_id string,

path_type string,

path_length int,

path_list string,

order_code string,

groupon_id bigint

)

partitioned by (ds string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n';"2.创建表的date当天分区(若分区不存在则创建,若存在则覆盖)

# 创建正式表rcmd_valid_path表date当天的分区目录

hive -e "set mapred.job.queue.name=pms;

insert overwrite table pms.test_rcmd_valid_path partition(ds='$date')

select track_id,

track_time,

session_id,

gu_id,

end_user_id,

page_category_id,

algorithm_id,

is_add_cart,

rcmd_product_id,

product_id,

path_id,

path_type,

path_length,

path_list,

order_code,

groupon_id

from pms.test_rcmd_valid_path where ds = '$date';"3.Job直接写入即可(留意job2OutputPath)

hadoop jar lib/bigdata-datamining-1.1-user-trace-jar-with-dependencies.jar com.yhd.datamining.data.usertrack.offline.job.mapred.TrackPathJob --similarBrandPath /user/pms/recsys/algorithm/schedule/warehouse/relation/brand/$yesterday --similarCategoryPath /user/pms/recsys/algorithm/schedule/warehouse/relation/category/$yesterday --mcSiteCategoryPath /user/hive/warehouse/mc_site_category --extractPreprocess /user/hive/warehouse/test_extract_preprocess --engineMatchRule /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday --artificialMatchRule /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday --category /user/hive/warehouse/category --keywordCategoryTopN 3 --termCategory /user/hive/pms/temp_term_category --extractGrouponInfo /user/hive/pms/extract_groupon_info --extractProductSerial /user/hive/pms/product_serial_id --job1OutputPath /user/pms/workspace/ouyangyewei/testUsertrack/job1Output --job2OutputPath /user/hive/pms/test_rcmd_valid_path/ds=$date

原文:http://blog.csdn.net/yeweiouyang/article/details/44834073

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值