【datax】mysql同步数据到hive

背景

临时需求:全量同步1亿 数据到 hive。
中间遇到的问题
1、查询mysql账户受限制,每次查询超过5秒,断开链接
2、dba给一个不受限制的账号,测试发现受数据库每次查询数据超过1G限制
3、经尝试各种datax参数不起作用后,写循环同步

#! /bin/bash
mkdir /tmp/
m=1
n=1000000

for ((i=1; i<=103; i++))
do
    echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>循环第 $i 次"
	cat>/tmp/mysql_2_hive_test_$i.json<<EOF
	{
	    "job":{
	        "content":[
	            {
	                "reader":{
	                    "name":"mysqlreader", 
	                    "parameter":{
	                        "username":"user",
	                        "password":"password",
	                                    "readSize": 1000,  
	                        "connection":[
	                            {
	                                "jdbcUrl":[
	                                    "jdbc:mysql://ip:3306/db?useSSL=false"
	                                ],
	                                "querySql":[
	                                    "SELECT * FROM table where id between  $m and $n;"
	                                ]
	                            }
	                        ]
	                    },
	                              "setting": {
	            "speed": 1000,
	            "fetchSize": 1000  
	          }
	                },
	                "writer":{
	                    "name":"hdfswriter",
	                    "parameter":{
	                        "column":[
	                            {
                                "name":"id",
                                "type":"bigint"
                            },
                            {
                                "name":"userinfoid",
                                "type":"bigint"
                            },
                            {
                                "name":"createtime",
                                "type":"bigint"
                            },
                            {
                                "name":"modifytime",
                                "type":"string"
                            }
	                        ],
	                        "defaultFS":"hdfs://prod-emr-110",
	                        "path":"/usr/hive/warehouse/mid.db/pay_transfer",
	                        "fieldDelimiter":"\t",
	                        "fileType":"text",
	                        "fileName":"pay_transfer",
	                        "writeMode":"append"
	                    }
	                }
	            }
	        ],

	    
	        "setting":{
	            "speed":{
	                "channel":"1"
	            }
	        }
	    }
	}
EOF
	echo "end create file <<<<<<"
    ((m=$m+1000000))
    ((n=$n+1000000))
    python /data/DataX/bin/datax.py   /tmp/mysql_2_hive_test_$i.json
	echo "endcreatefile <<<<<<"
    sleep 10  # 根据每次同步时间设定一定间隔,防止mysql或执行机负载过高
done

另外一个没有select 的


{
    "job":{
        "content":[
            {
                "reader":{
                    "name":"mysqlreader",
                    "parameter":{ 
                        "column":[
                            "id",
                            "userinfoId",
                            "createTime",
                            "modifyTime"
                        ],
                        "username":"user",
                        "password":"password",
                        "readSize":1000,
                        "connection":[
                            {
                                "jdbcUrl":[
	                                    "jdbc:mysql://ip:3306/db?useSSL=false"
                                ],
                                "table":[
                                    "transfer"
                                ]
                            }
                        ]
                    },
                    "setting":{
                        "speed":1000,
                        "fetchSize":1000
                    }
                },
                "writer":{
                    "name":"hdfswriter",
                    "parameter":{
                        "column":[
                            {
                                "name":"id",
                                "type":"bigint"
                            },
                            {
                                "name":"userinfoid",
                                "type":"bigint"
                            },
                            {
                                "name":"createtime",
                                "type":"bigint"
                            },
                            {
                                "name":"modifytime",
                                "type":"string"
                            }
                        ],
	                        "defaultFS":"hdfs://prod-emr-110",
	                        "path":"/usr/hive/warehouse/mid.db/pay_transfer",
	                        "fieldDelimiter":"\t",
	                        "fileType":"text",
	                        "fileName":"pay_transfer",
	                        "writeMode":"append"
                    }
                }
            }
        ],
        "setting":{
            "speed":{
                "channel":"1"
            }
        }
    }
}

如有问题,请指正

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值