三、配置子节点(3)

登录到xx2.com,修改sudoers文件
# visudo

# tail /etc/sudoers
# Samples
# %users ALL=/sbin/mount /cdrom,/sbin/umount /cdrom
# %users localhost=/sbin/shutdown -h now

# Globus GRAM Entities
globus ALL=(zhangxiao) NOPASSWD: /usr/local/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus/libexec/globus-job-manager-script.pl *


globus ALL=(zhangxiao) NOPASSWD: /usr/local/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus/libexec/globus-gram-local-proxy-tool *


在上一篇中,我曾经说了出现了错误,其实,如果大家按照上面的步骤去做,是不会错的。

在我调用类似于
[zx@xx1 tmp]$ globus-url-copy gsiftp://xx1.com/tmp/mytest gsiftp://xx2.com/tmp/form.xx1
这样的命令,出现了不能拷贝错误。

原因就是因为我不小心动了某些东西。结果把节点xx2.com上的gsiftp给弄错了,导致gsiftp启动不起来,不能够连接上xx2.com,
并且sudoers文件配置也错误。

于是一狠心,反正xx2.com是虚拟机,从来一遍。于是回到原始的snapshot,啪啪啪啪,反正轻车熟路,一会就安装好了。

执行启动容器的命令,发现地址全不是loopback的,这怎行?

# su - globus -c /usr/local/globus/bin/globus-start-container
[1]: https://127.0.0.1:8443/wsrf/services/TriggerFactoryService
[2]: https://127.0.0.1:8443/wsrf/services/DelegationTestService
[3]: https://127.0.0.1:8443/wsrf/services/SecureCounterService
[4]: https://127.0.0.1:8443/wsrf/services/IndexServiceEntry
[5]: https://127.0.0.1:8443/wsrf/services/DelegationService
[6]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroupFactory
[7]: https://127.0.0.1:8443/wsrf/services/mds/test/execsource/IndexService
[8]: https://127.0.0.1:8443/wsrf/services/mds/test/subsource/IndexService
[9]: https://127.0.0.1:8443/wsrf/services/SubscriptionManagerService
[10]: https://127.0.0.1:8443/wsrf/services/TestServiceWrongWSDL
[11]: https://127.0.0.1:8443/wsrf/services/SampleAuthzService
[12]: https://127.0.0.1:8443/wsrf/services/WidgetNotificationService
[13]: https://127.0.0.1:8443/wsrf/services/AdminService
[14]: https://127.0.0.1:8443/wsrf/services/DefaultIndexServiceEntry
[15]: https://127.0.0.1:8443/wsrf/services/CounterService
[16]: https://127.0.0.1:8443/wsrf/services/TestService
[17]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroup
[18]: https://127.0.0.1:8443/wsrf/services/SecurityTestService
[19]: https://127.0.0.1:8443/wsrf/services/ContainerRegistryEntryService
[20]: https://127.0.0.1:8443/wsrf/services/NotificationConsumerFactoryService
[21]: https://127.0.0.1:8443/wsrf/services/TestServiceRequest
[22]: https://127.0.0.1:8443/wsrf/services/IndexFactoryService
[23]: https://127.0.0.1:8443/wsrf/services/ReliableFileTransferService
[24]: https://127.0.0.1:8443/wsrf/services/mds/test/subsource/IndexServiceEntry
[25]: https://127.0.0.1:8443/wsrf/services/Version
[26]: https://127.0.0.1:8443/wsrf/services/NotificationConsumerService
[27]: https://127.0.0.1:8443/wsrf/services/IndexService
[28]: https://127.0.0.1:8443/wsrf/services/NotificationTestService
[29]: https://127.0.0.1:8443/wsrf/services/ReliableFileTransferFactoryService
[30]: https://127.0.0.1:8443/wsrf/services/DefaultTriggerServiceEntry
[31]: https://127.0.0.1:8443/wsrf/services/TriggerServiceEntry
[32]: https://127.0.0.1:8443/wsrf/services/PersistenceTestSubscriptionManager
[33]: https://127.0.0.1:8443/wsrf/services/mds/test/execsource/IndexServiceEntry
[34]: https://127.0.0.1:8443/wsrf/services/DefaultTriggerService
[35]: https://127.0.0.1:8443/wsrf/services/TriggerService
[36]: https://127.0.0.1:8443/wsrf/services/gsi/AuthenticationService
[37]: https://127.0.0.1:8443/wsrf/services/TestRPCService
[38]: https://127.0.0.1:8443/wsrf/services/ManagedMultiJobService
[39]: https://127.0.0.1:8443/wsrf/services/RendezvousFactoryService
[40]: https://127.0.0.1:8443/wsrf/services/WidgetService
[41]: https://127.0.0.1:8443/wsrf/services/ManagementService
[42]: https://127.0.0.1:8443/wsrf/services/ManagedExecutableJobService
[43]: https://127.0.0.1:8443/wsrf/services/InMemoryServiceGroupEntry
[44]: https://127.0.0.1:8443/wsrf/services/AuthzCalloutTestService
[45]: https://127.0.0.1:8443/wsrf/services/DelegationFactoryService
[46]: https://127.0.0.1:8443/wsrf/services/DefaultIndexService
[47]: https://127.0.0.1:8443/wsrf/services/ShutdownService
[48]: https://127.0.0.1:8443/wsrf/services/ContainerRegistryService
[49]: https://127.0.0.1:8443/wsrf/services/TestAuthzService
[50]: https://127.0.0.1:8443/wsrf/services/CASService
[51]: https://127.0.0.1:8443/wsrf/services/ManagedJobFactoryService

于是,我连忙修改$GLOBUS_LOCATION/etc/globus_wsrf_core下面的
server-config.wsdd 和 client-server-config.wsdd文件,
在<globalConfiguration></globalConfiguration>
中间都添加上,<parameter name="logicalHost" value="211.67.141.189" />元素,这样就好了。

把容器停止以后,我们再次启动容器,结果发现地址是对了,可是仍然提示FRT错误。
# su - globus -c /usr/local/globus/bin/globus-start-container
2007-07-13 21:52:18,916 ERROR service.ReliableFileTransferImpl [main,<init>:69] Unable to setup database driver with pooling.Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
2007-07-13 21:52:20,143 WARN service.ReliableFileTransferHome [main,initialize:97] All RFT requests will fail and all GRAM jobs that require file staging will fail.Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
2007-07-13 21:53:03,390 INFO impl.DefaultIndexService [ServiceThread-9,processConfigFile:107] Reading default registration configuration from file: /usr/local/globus/etc/globus_wsrf_mds_index/hierarchy.xml
Starting SOAP server at: https://211.67.141.189:8443/wsrf/services/
With the following services:

[1]: https://211.67.141.189:8443/wsrf/services/AdminService
[2]: https://211.67.141.189:8443/wsrf/services/AuthzCalloutTestService
[3]: https://211.67.141.189:8443/wsrf/services/CASService
[4]: https://211.67.141.189:8443/wsrf/services/ContainerRegistryEntryService
[5]: https://211.67.141.189:8443/wsrf/services/ContainerRegistryService
[6]: https://211.67.141.189:8443/wsrf/services/CounterService


这里,我们要清楚,我们的网格环境进行分布计算是需要RFT(reliable file transfer)支持的,可是我们在xx2.com上面,是没有RFT所需的postgresql数据库的,所以我们要改GRAM的工作环境,把工作台主机改为xx1.com,协议改为https。

注意,下面这一步是错误的,我故意把它挂上来,提醒大家。我是root用户执行的这个命令,虽然安装成功了,但是启动的时候会出错。
# $GLOBUS_LOCATION/setup/globus/setup-gram-service-common --staging-host=xx1.com --staging-protocol=https
Running /usr/local/globus/setup/globus/setup-gram-service-common
Determining system information...
Buildfile: /usr/local/globus/setup/globus/build-gram-service-gar.xml

启动的时候出错了,大家看看,原来是不能读取某个文件了。
# su - globus -c /usr/local/globus/bin/globus-start-container
Failed to start container: Container failed to initialize [Caused by: Failed to read '/usr/local/globus/etc/gram-service/jndi-config.xml' JNDI configuration file]


幸亏我用的是图形界面,可以翻滚回去,我看看执行安装命令都装了什么东西。

Running /usr/local/globus/setup/globus/setup-gram-service-common
Determining system information...
Buildfile: /usr/local/globus/setup/globus/build-gram-service-gar.xml

generateConfig:
[delete] Deleting directory /usr/local/globus/tmp/gram-service
[mkdir] Created dir: /usr/local/globus/tmp/gram-service
[copy] Copying 1 file to /usr/local/globus/tmp/gram-service
[mkdir] Created dir: /usr/local/globus/tmp/gram-service/etc
[echo] Checking for existence of file /usr/local/globus/etc/gram-service/globus_gram_fs_map_config.xml

其中

[delete] Deleting directory /usr/local/globus/tmp/gram-service
[mkdir] Created dir: /usr/local/globus/tmp/gram-service

说明了,这个gram-service文件夹应该是新建的,应该是root的属主,我看一下

# ls -la
total 32
drwxr-xr-x 4 root root 4096 Jul 13 22:00 .
drwxr-xr-x 5 globus globus 4096 Jul 13 22:01 ..
-rw-r--r-- 1 root root 11318 Jul 13 22:00 deploy-jndi-config-deploy.xml
drwxr-xr-x 2 root root 4096 Jul 13 22:00 etc
-rw-r--r-- 1 root root 1592 Jul 13 22:00 gram-service.gar
drwxr-xr-x 2 root root 4096 Jul 13 22:00 lib

果真,还真是root的,时间上也对着,现在晚上刚刚10点钟。

这样就明白了,globus用户启动容器,会出现有些文件没有权限读取,

算了,死马当活马医吧,我先删除了子节点上的/usr/local/globus/tmp/gram-service再说,

然后再换作globus用户执行
[globus@xx2 ~]$ $GLOBUS_LOCATION/setup/globus/setup-gram-service-common --staging-host=xxl.com --staging-protocol=https

成功之后,我再启动容器,已经不对GRAM进行报错了,却仍然对RFT进行报错。


我们就仔细思考了,网格是依靠RFT(Reliable File Transfer)进行一些夸平台作业的,现在我们在子节点xx2.com上,并没有安装RFT运行所依赖的postgresql,虽然我们在上一步,已经执行了把平台作业的主机设为xx1.com,但是我们就像当初,我们配置主机xx1.com一样,postgresql数据库默认只有本地主机 127.0.0.1可以连接。
然而,我们的每个节点都有自己的IP,所以我们必须在安装有postgresql的主机上进行相应设计,使得postgresql数据库允许从其它地方访问。

进入相应目录
# pwd
/usr/local/pgsql/data

修改相应的文件,把xx2.com的IP添加到信任区域。
# tail pg_hba.conf
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
# IPv4 local connections:
host all all 211.67.141.181/32 trust
# IPv4 local connections:
host all all 211.67.141.189/32 trust

关掉xx2.com的容器,再次启动容器,发现仍然有错误,xx2.com的容器,默认的RFT仍然是自己说
https://211.67.141.189:8443/wsrf/services/ReliableFileTransferFactoryService
服务失败或者找不到。

我们想,肯定是那个配置文件出了错误,应该连到xx1.com却连到了xx2.com。
那配置文件一般都在etc目录下面,我们进入$globus_location/etc下,一看,果真有rft字样的文件夹,
不管三七二十一,进入

# pwd
/usr/local/globus/etc/globus_wsrf_rft

看看都有什么?
# ls
factory-security-config.xml registration.xml server-config.wsdd
jndi-config.xml security-config.xml

一个一个找吧,我最后找到了应该修改的配置文件
# vi jndi-config.xml
找到其中一段

<parameter>
<name>
connectionString
</name>
<value>
jdbc:postgresql://xx2.com/rftDatabase
</value>
</parameter>
哈哈,明白了吧,数据库连接出错了。把它改成xx1就好了。
修改后,该是jdbc:postgresql://xx1.com/rftDatabase。

到这里,细心的朋友会问,数据库连接不是有帐号密码吗?不要急,大家接着往下找,结果又发现了两段
<parameter>
<name>
userName
</name>
<value>
globus
</value>
</parameter>
<parameter>
<name>
password
</name>
<value>
foo
</value>
</parameter>
哦,帐号密码都有了啊。大家还记得,我们配置第一台服务器的时候,创建了一个数据库rftDatabase,它的用户就是globus,密码是foo啊?
为何默认的是这样?我对postgresql了解不是很深,希望大家能告诉我。

好了,我们重启xx2.com上的容器,这下没有问题了。白天,在实验室憋了一天,已经凌晨了,就写道这里吧,受不了了。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值