问题描述:
现在公司需要将集群中一台机器上的HIVE表数据同步至另一个集群一台机器的MySQL表中,且这两个机器的网段是不连通的。如何实现?问题分析:
由于两台机器不连通,所以无法通过HIVE表所在集群中的Sqoop直接将数据同步至MySQL表。但是HIVE表所在集群有ftp服务,且MySQL表所在集群有一台机器可以访问外网。解决思路:
- Step one:将HIVE表数据导出到HIVE表机器本地
- Step two:将HIVE表所在机器本地导出文件压缩打包,通过ftp存入ftp站点
- Step three:在MySQL表所在集群的可以访问外网的机器从ftp站点下载压缩文件,并解压到本地
- Step four:通过JAVA jdbc将本地文件中的数据存入MySQL库
具体涉及到的程序:
- Step one:导出HIVE表数据到本地,并压缩,HiveToMySQL.sh
#!/bin/bash
source ~/.bash_profile
directory='/data/X. T. Xiao/HiveToMySQL/'
if [ -d $directory ]
then
echo "目录存在!"
else
echo “目录不存在!”
mkdir $directory
fi
cd $directory
rm -rf ./*
hive -e "select distinct cnt_date, click_cnt, play_cnt, user_cnt, recommended_cnt, click_percent, play_percent, per_capita_use, type from tag_model.t_live_on_demand" >> /data/X. T. Xiao/HiveToMySQL/JHTJ_tongbu.txt
hive -e "select distinct cnt_date, recommended_cnt, play_cnt, boot_cnt, play_percent, recommended_percent, type from tag_model.t_boot_live" >> /data/X. T. Xiao/HiveToMySQL/KJTJ_tongbu.txt
hive -e "select distinct cnt_date, recommended_cnt, click_cnt, user_cnt, click_percent, per_capita_use, type from tag_model.t_vip_film" >> /data/X. T. Xiao/HiveToMySQL/VIPTS_tongbu.txt
hive -e "select distinct cnt_date, recommended_cnt, click_cnt, user_cnt, click_percent, per_capita_use, type from tag_model.t_binge_watching" >> /data/X. T. Xiao/HiveToMySQL/ZJTS_tongbu.txt
hive -e "select distinct cnt_date, click_cnt, play_cnt, user_cnt, recommended_cnt, click_percent, play_percent, per_capita_use, type from tag_model.t_search_individuation" >> /data/X. T. Xiao/HiveToMySQL/QJTJ_tongbu.txt
echo 'Loading hive data to Local is OK !'
cd /data/X. T. Xiao/
rm -rf HiveToMySQL.tar.gz
cd /data/X. T. Xiao/
tar zcvf HiveToMySQL.tar.gz HiveToMySQL
echo 'Data compressed OK !'
* Step two:本地压缩文件上传至ftp站点, ftp_tran.py
#-*-coding:utf-8-*-
import sys
__author__ = 'X. T. Xiao'
#encoding=utf8
from ftplib import FTP #加载ftp模块
import sys
IP = '5.210.10.12'
user = 'X. T. Xiao'
password = 'NjExTllZGFmOTQ5'
def put_file(filename,path):
ftp=FTP() #设置变量
ftp.set_debuglevel(2) #打开调试级别2,显示详细信息
ftp.connect(IP) #连接的ftp sever和端口
ftp.login(user,password)#连接的用户名,密码
#print ftp.getwelcome() #打印出欢迎信息
ftp.storbinary('STOR %s'%filename, open(path, 'rb',8192))
print('success')
ftp.close()
if __name__ == '__main__':
put_file('HiveToMySQL.tar.gz', '/data/X. T. Xiao/HiveToMySQL.tar.gz')
* Step three:从ftp站点下载并解压, tarToLocal.sh
#!/bin/bash
directory='/home/ftpdata/ftp_test/'
if [ -d $directory ]
then
echo "目录存在!"
else
echo “目录不存在!”
mkdir $directory
fi
cd $directory
rm -rf ./HiveToMySQL
wget ftp://5.210.10.12:21/HiveToMySQL.tar.gz --ftp-user=X. T. Xiao --ftp-password=NjExTllZGFmOTQ5 -r
cd $directory
tar -xzf ./5.210.10.12/HiveToMySQL.tar.gz
rm -rf /home/ftpdata/ftp_test/5.210.10.12
* Step four:将本地文件通过jdbc插入到MySQL库, txtToMySQL.java
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Connection;