需求概述
某项业务中,我方通过文件接口与上游系统进行数据交互,上游系统通过ftp的方式推送接口数据文件到指定目录,我方通过sqlldr的方式入库并处理接口数据。
为保证接口数据的一致性,我方每天会将前一天的各类接口数据合并后入库,然后进行校正操作。
本文是整个业务场景的前置部分的实现:合并各类接口文件为一个大文件。
需求实现
为保证通用性,采取配置文件加shell脚本的方式实现,见如下代码:
#!/usr/bin/env bash
# author:charlie<charlie2cindy@163.com>
# create date:2021-06-03 at Urumqui,XinJiang,China
# This script is used to recollect all kinds of files and combine them to a big file that can be used to
# inject data to oracle with sql loader
# go to target directory(param ${1})
cd ${1}
# combine each type of files to one file
want_service=${2} # 业务名称,配置文件中[]中的内容
conf_start=`grep -n "$want_service" ../collect.conf | awk -F':' '{print $1+1}'`
file_types=`sed -n "$conf_start,/\[/p" ../collect.conf | sed -n '/\[/!p' | grep "file_types" | awk -F'\=' '{print $2}' | awk -F':' '{for(i=1;i<=NF;i++){print $i}}'`
subfix=`date -d '1 day ago' '+%Y%m%d'`
kt_title=`sed -n "$conf_start,/\[/p" ../collect.conf | sed -n '/\[/!p' | grep "KT_TITLE" | awk -F'\=' '{print $2}'`
kc_title=`sed -n "$conf_start,/\[/p" ../collect.conf | sed -n '/\[/!p' | grep "KC_TITLE" | awk -F'\=' '{print $2}'`
bg_title=`sed -n "$conf_start,/\[/p" ../collect.conf | sed -n '/\[/!p' | grep "BG_TITLE" | awk -F'\=' '{print $2}'`
tb_title=`sed -n "$conf_start,/\[/p" ../collect.conf | sed -n '/\[/!p' | grep "TB_TITLE" | awk -F'\=' '{print $2}'`
ks_title=`sed -n "$conf_start,/\[/p" ../collect.conf | sed -n '/\[/!p' | grep "KS_TITLE" | awk -F'\=' '{print $2}'`
tmp_store_dir=`sed -n "$conf_start,/\[/p" ../collect.conf | sed -n '/\[/!p' | grep "tmp_store_dir" | awk -F'\=' '{print $2}'`
# 循环合成大文件
for sigfile in ${file_types} ; do
cat ${sigfile}*.csv > ${sigfile}_${subfix}.csv
# 清除大文件中的表头
sed -i '/ORDER_ID/d' ${sigfile}_${subfix}.csv
# 下方几行命令操作是特殊需求,可以不理会,按需放开注释即可。无需此处特殊处理的直接使用第35行代码即可。
# iconv -f gb18030 -t UTF-8 ${sigfile}_${subfix}.csv > ${sigfile}_iconv_${subfix}.csv
# sed -i '/处理中/d' ${sigfile}_iconv_${subfix}.csv
# rm -f ${sigfile}_${subfix}.csv
# iconv -f UTF-8 -t gb18030 ${sigfile}_iconv_${subfix}.csv > ${sigfile}_${subfix}.csv
mv ${sigfile}_${subfix}.csv ${sigfile}_iconv_${subfix}.csv
# 判断文件类型,根据不通文件添加对应的表头
order_type=${sigfile##*_}
if [ $order_type == "KT" ]; then
sed -i "1i ${kt_title}" ${sigfile}_iconv_${subfix}.csv
elif [ $order_type == "KC" ]; then
sed -i "1i ${kc_title}" ${sigfile}_iconv_${subfix}.csv
elif [ $order_type == "BG" ]; then
sed -i "1i ${bg_title}" ${sigfile}_iconv_${subfix}.csv
elif [ $order_type == "TB" ]; then
sed -i "1i ${tb_title}" ${sigfile}_iconv_${subfix}.csv
elif [ $order_type == "KS" ]; then
sed -i "1i ${ks_title}" ${sigfile}_iconv_${subfix}.csv
else
echo "[`date '+%Y-%m-%d %H:%M:%S'`] NO order type matches" >> ../recollectlog.log
fi
# move combine file to tmp directoty
mv ${sigfile}_iconv_${subfix}.csv ${tmp_store_dir} && echo "[`date '+%Y-%m-%d %H:%M:%S'`] file ${sigfile}_${subfix}.csv has been moved to tmp dir" >> ../recollectlog.log
rm -f ${sigfile}_${subfix}.csv
done
注意:直接复制上方代码到Linux环境,会因为编辑(脚本在Windows上开发的)环境差异,导致脚本执行各种报错,建议复制代码后使用PilotEdit改变一下文件换行符。或者直接替换掉回车符也行。
配置文件
配置文件名称为:collect.conf,样例内容如下
tmp_store_dir:文件合并处理后统一存放的临时目录
file_types:要分别合并的各类文件名称中一致的部分,使用英文冒号:分开
xx_TITLE:各类文件的表头
[JK_ORDER_15Min]
tmp_store_dir=/home/irms_ftp/home/irms_ftp/ftp_dir/irms/ZG/tmp_deal/
file_types=GXCJ_ORDER_KT:GXCJ_ORDER_KC:GXCJ_ORDER_BG:GXCJ_ORDER_TB:GXCJ_ORDER_KS
KT_TITLE=ORDER_CODE^CUSTOM_NO^CUSTOM_NAME^CUSTOM_CITY_NAME^CUSTOM_COUNTRY_NAME^A_CITY_NAME^A_COUNTRY_NAME^Z_CITY_NAME^Z_COUNTRY_NAME^ORDER_STATE^CUR_TACHE^START_TIME^END_TIME^DEADLINE_TIME^CUR_TACHE_MAN^TELPHONE^ORDER_NAME^PROD_ID^CUSTOMER_LEVEL^BUSI_TYPE^CRM_ID^CUSTOM_ADDRESS^ASSURANCE_LEVEL^BUZI_AREA_SCPOE_NAME^BANDWIDTH^CIRCUITNAME^CUR_TACHE_ID^CUR_TACHE_DEADLINE_TIME^CUR_TACHE_STATE^CUR_TACHE_CREATE_TIME^CUR_TACHE_RETURN_TIME^ORDER_ID^BELONG_ORG_STAFF^WORK_RESULT^LINE_AREA_TYPE_XJ^MAIN_ORDER_ID
KC_TITLE=ORDER_CODE^CUSTOM_NO^CUSTOM_NAME^CUSTOM_CITY_NAME^CUSTOM_COUNTRY_NAME^A_CITY_NAME^A_COUNTRY_NAME^Z_CITY_NAME^Z_COUNTRY_NAME^ORDER_STATE^CUR_TACHE^START_TIME^END_TIME^DEADLINE_TIME^CUR_TACHE_MAN^TELPHONE^COL_RESULT^ORDER_NAME^PROD_ID^CUSTOMER_LEVEL^BUSI_TYPE^CRM_ID^CUSTOM_ADDRESS^ASSURANCE_LEVEL^BUZI_AREA_SCPOE_NAME^BANDWIDTH^CIRCUITNAME^CUR_TACHE_ID^CUR_TACHE_DEADLINE_TIME^CUR_TACHE_STATE^CUR_TACHE_CREATE_TIME^CUR_TACHE_RETURN_TIME^ORDER_ID^BELONG_ORG_STAFF^WORK_RESULT^LINE_AREA_TYPE_XJ^CONS_DAY^MAIN_ORDER_ID
BG_TITLE=ORDER_CODE^CUSTOM_NO^CUSTOM_NAME^CUSTOM_CITY_NAME^CUSTOM_COUNTRY_NAME^A_CITY_NAME^A_COUNTRY_NAME^Z_CITY_NAME^Z_COUNTRY_NAME^ORDER_STATE^CUR_TACHE^START_TIME^END_TIME^DEADLINE_TIME^CUR_TACHE_MAN^TELPHONE^ORDER_NAME^PROD_ID^CUSTOMER_LEVEL^BUSI_TYPE^CRM_ID^CUSTOM_ADDRESS^ASSURANCE_LEVEL^BUZI_AREA_SCPOE_NAME^BANDWIDTH^CIRCUITNAME^CUR_TACHE_ID^CUR_TACHE_DEADLINE_TIME^CUR_TACHE_STATE^CUR_TACHE_CREATE_TIME^CUR_TACHE_RETURN_TIME^ORDER_ID^BELONG_ORG_STAFF^WORK_RESULT^MAIN_ORDER_ID
TB_TITLE=ORDER_CODE^CUSTOM_NO^CUSTOM_NAME^CUSTOM_CITY_NAME^CUSTOM_COUNTRY_NAME^A_CITY_NAME^A_COUNTRY_NAME^Z_CITY_NAME^Z_COUNTRY_NAME^ORDER_STATE^CUR_TACHE^START_TIME^END_TIME^DEADLINE_TIME^CUR_TACHE_MAN^TELPHONE^ORDER_NAME^PROD_ID^CUSTOMER_LEVEL^BUSI_TYPE^CRM_ID^CUSTOM_ADDRESS^ASSURANCE_LEVEL^BUZI_AREA_SCPOE_NAME^BANDWIDTH^CIRCUITNAME^CUR_TACHE_ID^CUR_TACHE_DEADLINE_TIME^CUR_TACHE_STATE^CUR_TACHE_CREATE_TIME^CUR_TACHE_RETURN_TIME^ORDER_ID^BELONG_ORG_STAFF^WORK_RESULT^MAIN_ORDER_ID
KS_TITLE=FLOW_ID^FLOW_NO^FLOW_TITLE^PRODUCT_NO^CUSTOMER_CODE^CUSTOMER_NAME^CUSTOMER_LEVEL^PRODUCT_TYPE_NAME^A_ACCESS_ADD^START_TIME^DEAL_LIMIT^CITY_ID^CITY_NAME^CURRENT_STATE^END_TIME^FLOW_MODEL^FLOW_NAME^PRODUCT_TYPE^BUSINESS_LEVEL^A_CITY_NAME^Z_CITY_NAME^A_COUNTY_NAME^Z_COUNTY_NAME^BAND_WIDTH^IS_REJECT^REJECT_REASON^Z_ACCESS_ADD^REJECT_TIME^CIRCUIT_NAME^GROUP_FORM_NO^OPERATE_TYPE^SUBMIT_GROUP_TIME^A_PROVINCE_NAME^Z_PROVINCE_NAME^A_CITY_ID^Z_CITY_ID^Z_COUNTY_ID^A_COUNTY_ID^IS_BACK^IS_DELAY^IS_APPROVE_DELAY^DELAY_LENGTH^IS_CANCEL^CANCEL_REASON^CANCEL_PERSON^CANCEL_TIME^REJECT_PERSON^IS_UNARCHIVE^CUR_TACHE^CUR_TACHE_ID^CUR_TACHE_DEADLINE_TIME^CUR_TACHE_STATE^CUR_TACHE_CREATE_TIME^CUR_TACHE_RETURN_TIME^ORDER_ID^BELONG_ORG_STAFF^WORK_RESULT^SATISFACTION^MAIN_ORDER_ID
部署及调用
可按照使用需求部署到要合并文件的上一级目录(脚本和配置放到同一级目录)
调用如下:
cd /你的程序部署路径
sh day_collect_repeat.sh 文件所在目录名 配置业务名称