在使用Kettle的集群排序中 Carte的设定——(基于Windows)

最新推荐文章于 2023-08-04 14:02:11 发布

weixin_30739595

最新推荐文章于 2023-08-04 14:02:11 发布

阅读量318

点赞数

文章标签：操作系统数据库大数据

原文链接：http://www.cnblogs.com/inuyasha1027/p/kettle_carte.html

版权

本片文章主要是关于使用Kettle的UI界面：

Spoon来实现基于集群的对数据库中的数据表数据进行排序的试验。

以及在实验过程中所要开启的Carte服务的一些配置文件的设置，

还有基于Windows cmd 的相关Carte命令。

文章主要分为六个部分：

1.介绍carte 　　

2.carte相关配置文件的设定

3.carte服务的开启命令

4.在kettle的图形界面中对集群进行相关的设定 　　

5.使用kettle集群模式对相关的数据进行排序

6.有关于集群调用子服务器的java源代码调用实现

1.介绍carte

carte是由kettle所提供的web server的程序，
carte也被叫做子服务器（slave）在kettle调用集群（cluster）来进行分布式分发、处理任务的时候，

可以开启多个carte服务进程来进行分发ETL（master）任务和接收，运行，提交ETL任务（slave）。

就像是《pentaho kettle solutions》中对Carte的定义：

"Carte a lightweight server process allows for remote monitoring and enables the transformation clustering capabilities ".

"Carte是一个轻量级的服务器进程，可以远程监控和开启转换集群的能力".

2.carte相关配置文件的设定

与hadoop的结点设置类似，本实验将要实现的是基于一台主机，

开启四个carte服务，其中一台为Master另外三台为Slave，

来实现在Kettle的Spoon中对数据库中数据表读取后以集群的方式来执行排序的过程。

开启的carte服务所显示的命令窗口都是一样的，但是究竟哪一个是主服务哪些又是子服务呢？

对于集群中的主服务器还是子服务器的设定，

我们仍旧引用《pentaho kettle solutions》书中的一段话进行说明（因为很权威的）：

"A cluster schema consists of one master server that is being used as a controller

for the cluster , and a number of non-master slave servers. In short, we refer to

the controlling Carte server as the master and the other Carte servers as slaves"

LZ在不考虑到句式主谓宾定状补的条件下，对上述介绍的理解是这样的。

"一个集群实体是由一个用来主控整个集群的主节点

和多个不是主节点

（也就是主节点除外，即配置文件中属性<master>N</master>对应的值置为N的对应结点）

的子服务器所构成的。

简而言之，我们把开启的主控Carte 服务器叫做主节点而其他的Carte 服务器叫做从结点"。

关于Carte的服务器是主还是从是由相关的配置文件：carte-config.xml中的

属性<master></master>中是"Y"还是"N" 所设定的，

其实这个和hadoop通过相关的XML配置文件来设定是主节点还是从节点是很神似的。

配置文件吧，其实根据计算机不同，以及计算机中的环境变量的不同而千差万别。

主要说一下LZ关于配置文件的设定过程吧，

若想让Carte程序可以成功运行的话，首先就应该设定它的配置文件，

配置文件所在的路径，如下图所示：

（carte-config.xml 截图）

在这里LZ在正常进行配置的时候cmd窗口报错，说是在kokia/Acer/user/acer/

的下面找不到pwd文件夹(kokia是LZ的计算机名称)

所以LZ根据提示将kettle安装解压路径下的pwd文件夹复制了一份到提示信息的路径下，

才使得Carte正常运行，不过要让LZ说是什么原理嘛，其实LZ也不知道的，

或许默认Carte服务启动的时候会到该路径下自行寻找相关的配置文件吧......

pwd这个文件夹下面默认存放的是关于Carte的一些配置文件以及登陆用户名以及密码等等，

它所在的kettle安装包的路径就是./data-integration/pwd 这个下面的。

下面是关于主服务器（master：carte-config-master-8080.xml）配置文件进行相关注释说明：

<slave_config>

<slaveserver>
<name>master1</name>
<hostname>localhost</hostname>
<port>8080</port>
<master>Y</master>
</slaveserver>


</slave_config>

<!--
even though called master node  ,

it is a instance of the  slaveserver

<name> attribute is used to define the name of the slaveserver
<hostname> in this conf file is the localhost which equal
to the "127.0.0.1" IP address

当然，对于这个hostname的话，在Linux的环境中，

在对应的配置文件中 有相关的IP地址与主机名称相对应的，

在Windows下面，LZ并不知道相关的配置文件在哪里，

所以如果是集群的节点所在的并不是基于一台主机的话，
<hostname>这个属性的值可以使用该节点所在的主机IP地址所代替。
<port> 8080 , in carte the port of 8080 is regarded 
as the port of the master node in default

<master> : Y  which talked about above , attribute value = Y
means that the current slaveserver is regarded as the master node
in the cluster.
-->

下面是关于子服务器（slave）的配置文件进行相关注释说明：

<slave_config>

<masters>

<slaveserver>

<name>master1</name>
<hostname>localhost</hostname>
<port>8080</port>
<username>cluster</username>
<password>cluster</password>
<master>Y</master>
</slaveserver>

</masters>


<report_to_masters>Y</report_to_masters>

<slaveserver>
<name>slave1-8081</name>
<hostname>localhost</hostname>
<port>8081</port>
<username>cluster</username>
<password>cluster</password>
<master>N</master>
</slaveserver>

</slave_config>