大数据实时采集链路设计及落地

1.项目背景

将其他部门产生的业务数据实时接入大数据平台,利用数据平台对数据进行etl计算,将结果进行页面或大屏展示

2.项目使用技术栈

计算框架:Spark,Hive

存储:HDFS,PostgreSql

消息队列:Kafka

3.项目架构图

4.项目落地以及优化

4.1 sparkstreaming消费kafka数据写入hdfs:

<1>为防止数据丢失将kafka offsets 信息存储在postgresql数据库中进行手动管理

在原来已经订阅的topics的基础上新增topic时需要在offset中插入数据(预处理): 例如:新增一个topic,该topic设置了2个partition需要在offsets表中插入两条数据(预处理一个topic插入的数据条数=一个topic的partition的个数 )目前 每个topic有2个partition insert into pdl.offset_management(project_name,group_id,topic,"partition","offset",update_date ) values('omni-channel','cdp','msd_polaris.cdppolaris-phone-call',0,0,to_char(now()::timestamp(0) without time zone, 'YYYY-MM-DD HH24:MI:SS') ); insert into pdl.offset_management(project_name,group_id,topic,"partition","offset",update_date ) values('omni-channel','cdp','msd_polaris.cdppolaris-phone-call',1,0,to_char(now()::timestamp(0) without time zone, 'YYYY-MM-DD HH24:MI:SS') );

<2>消费kafka的数据

消费kafka中的数据的写入文件位置为 /product/collect/ods/poc/realtime 消费kafka中的数据写入文件的规则 kafka message header,kafka consumerrecord timestamp#001#{json数据} 说明:kafka message header被认为作为数据源表名;kafka consumerrecord timestamp 数据发送时间;#001# 自定义分隔符 例如: polaris_lvidg_partnerfunctions,1668584651754#001#{"importsequencenumber":null,"lvidg_countryidname":"Japan"," lvidg_map_bw_partner_funcation":null,"lvidg_alternateshiptocontact":null,"lvidg_rpsindicator":null,"lvidg_housenumber":null," owningbusinessunitname":"CEC Japan","owninguser":null,"statecodename":"Active","statuscode":1,"createdby":"82dd41d6-7a27-ed11-9db1- 00224817638e","owningteam":"a58bfa62-3206-ec11-94ef-000d3aa2eb6e","lvidg_customerididtype":"account","lvidg_customerid":"26912597- 772a-ed11-9db1-002248170b0b","lvidg_street3":null,"lvidg_street2":null,"lvidg_street1":"*","createdonbehalfbyname":null," timezoneruleversionnumber":null,"lvidg_phone":"+81759252016","lvidg_timezonename":null,"createdonbehalfbyyominame":null,"ownerid":" a58bfa62-3206-ec11-94ef-000d3aa2eb6e","lvidg_workorderid":"98cc1045-6c5e-ed11-9562-00224816816c","lvidg_phoneextension":null," lvidg_map_bw_post_code2":null,"lvidg_rpsindicatorname":null,"lvidg_map_bw_post_code3":null,"modifiedon":"2022-11-07T07:31:55Z"," lvidg_alternatepickupcontact":null,"lvidg_alternateshiptoname":null,"lvidg_partnerfunctionsid":"7471504f-6c5e-ed11-9561-000d3a8073bd"," lvidg_regionid":"b1cc5af7-d6f6-e811-a97e-000d3aa041bf","lvidg_emailaddress":"nomail20220902132702@lenovo.com","modifiedby":" 71e9f83f-7eb9-43f7-966f-2083b42eb3db","modifiedonbehalfby":null,"lvidg_stateregionname":"Hokkaido","owningbusinessunit":"08d59069- e3a7-e911-a991-000d3aa00fc2","owneridtype":"team","createdonbehalfby":null,"createdbyyominame":"Shinobu Kawakami"," lvidg_companyname":null,"necpc_alternativeshiptoname":null,"necpc_language":null,"owneridyominame":"JP_Depot","lvidg_countryid":" de3f30be-d6f6-e811-a981-000d3aa000bd","modifiedbyname":"SYSTEM","utcconversiontimezonecode":null,"statecode":0," necpc_useridentifyname":null,"lvidg_map_bw_recordmode":null,"necpc_useridentify":null,"lvidg_customername":". 井上 洋子 "," lvidg_typename":"Bill-To Party","lvidg_city":"*","overriddencreatedon":null,"lvidg_zippostalcode":"0000000","lvidg_customeridyominame":". 井 上 洋子 ","lvidg_customersapid":null,"modifiedbyyominame":"SYSTEM","statuscodename":"Active","lvidg_alternatepickupname":null," lvidg_customeridname":". 井上 洋子 ","lvidg_stateregionisocountrycode":"JP-01","owneridname":"JP_Depot","versionnumber":7932347094," createdon":"2022-11-07T07:18:00Z","lvidg_partnerid":"6021443077","lvidg_countryname":"Japan","lvidg_map_bw_post_box":null," modifiedonbehalfbyyominame":null,"lvidg_type":100000003,"lvidg_workorderidname":"4005163264","modifiedonbehalfbyname":null," lvidg_regionidname":"Hokkaido","necpc_necpcpartnerid":null,"createdbyname":"Shinobu Kawakami","necpc_languagename":null}

<3>程序关闭方式

1.查看hdfs路径/product/collect/ods/poc

若存在/product/collect/ods/poc/realtime_app_stop_tmp 关闭程序 可执行 hdfs dfs -mv /product/collect/ods/poc/realtime_app_stop_tmp /product/collect/ods/poc/realtime_app_stop

若不存在/product/collect/ods/poc/realtime_app_stop_tmp 关闭程序 可执行 hdfs dfs -mkdir /product/collect/ods/poc/realtime_app_stop 

4.2定时合并4.1产生的hdfs小文件:

每个小时的零5分定时合并/product/collect/ods/poc/realtime下的小文件,将小文件合并至 /product/collect/ods/poc/merge 说明:合并后文件位置 /product/collect/ods/poc/merge/年/月/日/小时/snappy文件 查看合并后的文件信息

4.3定时将4.2的hdfs下的数据解析后写入hive

上一步4.2合并小文件程序运行成功完成5分钟后将4.2合并的目录文件解析写入hive对应的表中 topic:msd_polaris.cdp-polaris-case 对应的json数据中 “lvidg_contact_incident”数据属性值为json,其他属性值为字符串

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值