Apologies, it seems that YAST does weird things with /etc/hosts after you make changes, so this is where the 127.0.0.2 appears. mpdboot works fine now. Cheers, Tom On Sun, 2006-05-21 at 18:43 +0100, Tom Crick wrote: > Hello, > > I've been having an issue using mpdboot (MPICH2 1.0.2) on a beowulf > cluster of 20 nodes running SuSE 9.2. After following the mpd > troubleshooting guide in the MPICH2 install doc, I can still find no > obvious answer why I am unable to start mpds on the 20 nodes using > mpdboot. > > mpdboot gives a message like: > > mpdboot_grendel13_11 (err_exit 415): mpd failed to start correctly on > grendel13 > reason: 11: unable to ping local mpd; > invalid msg from mpd :{}: > ** mpd may have disappeared, perhaps due to mismatched secretwords > ** see msgs logged in syslog and /tmp/mpd2.logfile* on grendel13 > last printed output from mpd before becoming a daemon: 32838 > > mpdboot_grendel13_11 (err_exit 421): contents of mpd logfile in /tmp: > logfile for mpd with pid 3828 > grendel13_32838: conn error in connect_rhs: Connection refused > grendel13_32838 (connect_rhs 602): failed to connect to rhs at > 127.0.0.2 32849 > grendel13_32838 (enter_ring 513): rhs connect failed > grendel13_32838 (run 215): failed to enter ring > > > Even if you start an mpd manually on the head node and then on each work > node e.g. "mpd -h <host> -p <port> &", it fails like above. Is it > something to do with the "failed to connect to rhs at 127.0.0.2 32849"? > > It is possible to ssh from every machine to every other and running > "mpdcheck -v -f /etc/mpd.hosts -ssh" from the head node gives no errors > or problems. Checking the log files on the failing machines gives no > more info than above and the secretwords on all machines are the same. > > Any ideas for next step of debugging? Should mpd be run as root? > > Thanks and regards, > > Tom >
简单一点说,就是把/etc/hosts文件中带有“127.0.0.1 ”的一行注释掉,这样可能会引起系统启动时,
在starting sendmail外停留很久,2-5分钟都有可能
建议,因此关掉sendmail服务。
如果上面的方法不能解决问题,请参考:http://blog.csdn.net/zhuliting/archive/2010/09/29/5915224.aspx
from:http://lists.mcs.anl.gov/pipermail/mpich-discuss/2006-May/001388.html