增量数据采集脚本

最新推荐文章于 2023-05-26 10:15:00 发布

章鱼哥TuNan&Z

最新推荐文章于 2023-05-26 10:15:00 发布

阅读量330

点赞数

分类专栏： Shell

本文链接：https://blog.csdn.net/qq_43528451/article/details/116607094

版权

Shell 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

01.collect.sh

#!/bin/bash

#step1：先获取要采集的数据时间，规则：如果没有给参数，就默认处理昨天的日期，如果给了参数，就参数对应的日期
if [ $# -ne 0 ]
then
	#参数个数不为0
	if [ $# -ne 1 ]
	then
		echo "参数至多只能有一个，为处理的日期，请重新运行！"
		exit 100
	else
		#参数个数只有1个，就用第一个参数作为处理的日期
		yesterday=$1
	fi
else
	#参数个数为0，默认处理昨天的日期
	yesterday=`date -d '-1 day' +%Y-%m-%d`
fi
echo "step1：要处理的日期是：${yesterday}"

echo "step2：开始运行采集的程序"
#step2：运行增量采集
SQOOP_HOME=/export/server/sqoop-1.4.6-cdh5.14.0
$SQOOP_HOME/bin/sqoop  import \
--connect jdbc:mysql://node3:3306/db_order \
--username root \
--password-file hdfs://node1:8020/user/oozie/shell/sqoop.passwd \
--query "select * from tb_order where substring(create_time,1,10) = '${yesterday}' and \$CONDITIONS " \
--delete-target-dir \
--target-dir /nginx/logs/tb_order/daystr=${yesterday} \
--fields-terminated-by '\t' \
-m 1

echo "step2：采集的程序运行结束"


echo "step3：开始运行ETL"
#模拟ETL的过程，将采集的新增的数据移动到表的目录下
HADOOP_HOME=/export/server/hadoop-2.6.0-cdh5.14.0
#先判断结果是否存在，如果已经存在，先删除再移动
$HADOOP_HOME/bin/hdfs dfs -test -e  /user/hive/warehouse/tb_order/daystr=${yesterday}
if [ $? -eq 0 ]
then
	#存在
	$HADOOP_HOME/bin/hdfs dfs -rm -r  /user/hive/warehouse/tb_order/daystr=${yesterday}
	$HADOOP_HOME/bin/hdfs dfs -cp /nginx/logs/tb_order/daystr=${yesterday} /user/hive/warehouse/tb_order/
else
	#不存在
	$HADOOP_HOME/bin/hdfs dfs -cp /nginx/logs/tb_order/daystr=${yesterday} /user/hive/warehouse/tb_order/
fi 
echo "step3：ETL结束"

章鱼哥TuNan&Z

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
增量数据采集脚本

01.collect.sh#!/bin/bash#step1：先获取要采集的数据时间，规则：如果没有给参数，就默认处理昨天的日期，如果给了参数，就参数对应的日期if [ $# -ne 0 ]then #参数个数不为0 if [ $# -ne 1 ] then echo "参数至多只能有一个，为处理的日期，请重新运行！" exit 100 else #参数个数只有1个，就用第一个参数作为处理的日期 yesterday=$1 fielse #参数个数为0，默认处理昨天的日
复制链接

扫一扫