摘要: DataX-On-Hadoop即使用hadoop的任务调度器,将DataX task(Reader->Channel->Writer)调度到hadoop执行集群上执行。这样用户的hadoop数据可以通过MR任务批量上传到ODPS、RDS等,不需要用户提前安装和部署DataX软件包,也不需要另外为DataX准备执行集群。
DataX-On-Hadoop即使用hadoop的任务调度器,将DataX task(Reader->Channel->Writer)调度到hadoop执行集群上执行。这样用户的hadoop数据可以通过MR任务批量上传到MaxCompute、RDS等,不需要用户提前安装和部署DataX软件包,也不需要另外为DataX准备执行集群。但是可以享受到DataX已有的插件逻辑、流控限速、鲁棒重试等等。
1. DataX-On-Hadoop 运行方式
1.1 什么是DataX-On-Hadoop
DataX https://github.com/alibaba/DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、HDFS、Hive、OceanBase、HBase、OTS、MaxCompute 等各种异构数据源之间高效的数据同步功能。 DataX同步引擎内部实现了任务的切分、调度执行能力,DataX的执行不依赖Hadoop环境。
DataX-On-Hadoop是DataX针对Hadoop调度环境实现的版本,使用hadoop的任务调度器,将DataX task(Reader->Channel->Writer)调度到hadoop执行集群上执行。这样用户的hadoop数据可以通过MR任务批量上传到MaxCompute等,不需要用户提前安装和部署DataX软件包,也不需要另外为DataX准备执行集群。但是可以享受到DataX已有的插件逻辑、流控限速、鲁棒重试等等。
目前DataX-On-Hadoop支持将Hdfs中的数据上传到公共云MaxCompute当中。
1.2 如何运行DataX-On-Hadoop
运行DataX-On-Hadoop步骤如下:
- 提阿里云工单申请DataX-On-Hadoop软件包,此软件包本质上也是一个Hadoop MR Jar;
- 通过hadoop客户端提交一个MR任务,您只需要关系作业的配置文件内容(这里是./bvt_case/speed.json,配置文件和普通的DataX配置文件完全一致),提交命令是:
<span style="color:#333333"><span style="color:#f8f8f2"><code class="language-java"><span style="color:#f8f8f2">.</span>/bin<span style="color:#f8f8f2">/</span>hadoop jar datax<span style="color:#f8f8f2">-</span>jar<span style="color:#f8f8f2">-</span>with<span style="color:#f8f8f2">-</span>dependencies<span style="color:#f8f8f2">.</span>jar com<span style="color:#f8f8f2">.</span>alibaba<span style="color:#f8f8f2">.</span>datax<span style="color:#f8f8f2">.</span>hdfs<span style="color:#f8f8f2">.</span>odps<span style="color:#f8f8f2">.</span>mr<span style="color:#f8f8f2">.</span>HdfsToOdpsMRJob <span style="color:#f8f8f2">.</span>/bvt_case<span style="color:#f8f8f2">/</span>speed<span style="color:#f8f8f2">.</span>json</code></span></span>
- 任务执行完成后,可以看到类似如下日志:
本例子的Hdfs Reader 和Odps Writer配置信息如下:
<span style="color:#333333"><span style="color:#f8f8f2"><code class="language-java"><span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"core"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"transport"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"channel"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"speed"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"byte"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"-1"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"record"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"-1"</span></span>
<span style="color:#f8f8f2">}</span>
<span style="color:#f8f8f2">}</span>
<span style="color:#f8f8f2">}</span>
<span style="color:#f8f8f2">}</span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"job"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"setting"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"speed"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"byte"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff"><span style="color:#ae81ff">1048576</span></span>
<span style="color:#f8f8f2">}</span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"errorLimit"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"record"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff"><span style="color:#ae81ff">0</span></span>
<span style="color:#f8f8f2">}</span>
<span style="color:#f8f8f2">}</span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"content"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">[</span>
<span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"reader"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"name"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"hdfsreader"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"parameter"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"path"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"/tmp/test_datax/big_data*"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"defaultFS"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"hdfs://localhost:9000"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"column"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">[</span>
<span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"index"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff"><span style="color:#ae81ff">0</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"type"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"string"</span></span>
<span style="color:#f8f8f2">}</span><span style="color:#f8f8f2">,</span>
<span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"index"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff"><span style="color:#ae81ff">1</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"type"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"string"</span></span>
<span style="color:#f8f8f2">}</span>
<span style="color:#f8f8f2">]</span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"fileType"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"text"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"encoding"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"UTF-8"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"fieldDelimiter"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">","</span></span>
<span style="color:#f8f8f2">}</span>
<span style="color:#f8f8f2">}</span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"writer"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"name"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"odpswriter"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"parameter"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">{</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"project"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">""</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"table"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">""</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"partition"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">"pt=1,dt=2"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"column"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">[</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"id"</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"name"</span></span>
<span style="color:#f8f8f2">]</span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"accessId"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">""</span></span><span style="color:#f8f8f2">,</span>
<span style="color:#a6e22e"><span style="color:#e6db74">"accessKey"</span></span><span style="color:#f8f8f2">:</span> <span style="color:#a6e22e"><span style="color:#e6db74">""</span></span><span style="colo