mpich-discuss mailing list:unable to ping local mpd

Today's Topics:

2. Re:  mpdboot problem:unable to ping local mpd (Dave Goodell)
----------------------------------------------------------------------
Message: 2
Date: Wed, 29 Sep 2010 08:05:51 -0500
From: Dave Goodell <
goodell@mcs.anl.gov>
Subject: Re: [mpich-discuss] mpdboot problem:unable to ping local mpd
To:
mpich-discuss@mcs.anl.gov
Message-ID: <883B6895-DA80-43CD-8C06-F9B6A149A0EF@mcs.anl.gov>
Content-Type: text/plain; charset=us-ascii
Running mpd as root is tricky.  You shouldn't do it unless you really need to and really know what you are doing with it.
Better yet, just don't use mpd at all.  Use hydra instead, it's much more robust:
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
-Dave
On Sep 29, 2010, at 6:47 AM CDT, Albert wrote:
> I have a problem with MPICH2 on lenovo cluster when I start more than three nodes.

> The error info is as follows.
> Could anyone give me some advice?Thanks 

> Albert

> [root@c0107 ~]# mpdboot -n 2 -f mpd.hosts
> mpdboot_c0107_0 (mpdboot 393): error trying to start mpd(boot) at 1 {'host': 'c0104', 'ncpus': 1, 'ifhn': ''}; output:
>    mpdboot_c0104_1 (err_exit 415): mpd failed to start correctly on c0104
>      reason: 1: unable to ping local mpd;
>    invalid msg from mpd :{}:
>    ** mpd may have disappeared, perhaps due to mismatched secretwords
>    ** see msgs logged in syslog and /tmp/mpd2.logfile* on c0104
>    last printed output from mpd before becoming a daemon:
>    37857
>   
>    mpdboot_c0104_1 (err_exit 421):   contents of mpd logfile in /tmp:
>         logfile for mpd with pid 32501
>         c0104_37857: conn error in connect_lhs: No route to host
>         c0104_37857 (connect_lhs 542): failed to connect to lhs at c0107 46288
>         c0104_37857 (enter_ring 500): lhs connect failed
>         c0104_37857 (run 215): failed to enter ring
> mpdboot_c0107_0 (err_exit 415): mpd failed to start correctly on c0107
> [root@c0107 ~]# ssh c0104
> Last login: Wed Sep 29 19:29:06 2010 from console
> [root@c0104 ~]# mpdboot -n 2 -f mpd.hosts
> [root@c0104 ~]# mpdtrace
> c0104
> c0107
> [root@c0104 ~]# mpdboot -n 3 -f mpd.hosts
> mpdboot_c0104_0 (mpdboot 406): error trying to start mpd(boot) at 2 {'host': 'c0108', 'ncpus': 1, 'ifhn': ''}; output:
>    mpdboot_c0108_2 (err_exit 415): mpd failed to start correctly on c0108
>      reason: 2: unable to ping local mpd;
>    invalid msg from mpd :{}:
>    ** mpd may have disappeared, perhaps due to mismatched secretwords
>    ** see msgs logged in syslog and /tmp/mpd2.logfile* on c0108
>    last printed output from mpd before becoming a daemon:
>    41819
>   
>    mpdboot_c0108_2 (err_exit 421):   contents of mpd logfile in /tmp:
>         logfile for mpd with pid 4894
>         c0108_41819: conn error in connect_rhs: No route to host
>         c0108_41819 (connect_rhs 602): failed to connect to rhs at 192.168.1.7 49518
>         c0108_41819 (enter_ring 513): rhs connect failed
>         c0108_41819 (run 215): failed to enter ring
> mpdboot_c0104_0 (err_exit 415): mpd failed to start correctly on c0104
> _______________________________________________
> mpich-discuss mailing list
>
mpich-discuss@mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值