linux杀死守护进程,万无一失的跨平台进程杀死守护进程

最新推荐文章于 2023-05-16 23:42:32 发布

江啾

最新推荐文章于 2023-05-16 23:42:32 发布

阅读量660

点赞数

文章标签： linux杀死守护进程

我有一些python自动化,它产生telnet我用linux script命令记录的会话; script每个日志记录会话有两个进程ID(父级和子级).

我需要解决一个问题,如果python自动化脚本死了,script会话永远不会自己关闭; 由于某种原因,这比它应该更难.

到目前为止,我已经实现了watchdog.py(请参阅问题的底部)守护自身,并在循环中轮询python自动化脚本的PID.当它看到python自动化PID从服务器的进程表中消失时,它会尝试终止script会话.

我的问题是:

script会话总是产生两个独立的进程,其中一个script会话是另一个script会话的父进程.

watchdog.py script如果我script从自动化脚本启动会话,则不会终止子会话(请参阅下面的自动化示例)

自动化示例(reproduce_bug.py)

import pexpect as px

from subprocess import Popen

import code

import time

import sys

import os

def read_pid_and_telnet(_child, addr):

time.sleep(0.1) # Give the OS time to write the PIDFILE

# Read the PID in the PIDFILE

fh = open('PIDFILE', 'r')

pid = int(''.join(fh.readlines()))

fh.close()

time.sleep(0.1)

# Clean up the PIDFILE

os.remove('PIDFILE')

_child.expect(['#', '\$'], timeout=3)

_child.sendline('telnet %s' % addr)

return str(pid)

pidlist = list()

child1 = px.spawn("""bash -c 'echo $$ > PIDFILE """

"""&& exec /usr/bin/script -f LOGFILE1.txt'""")

pidlist.append(read_pid_and_telnet(child1, '10.1.1.1'))

child2 = px.spawn("""bash -c 'echo $$ > PIDFILE """

"""&& exec /usr/bin/script -f LOGFILE2.txt'""")

pidlist.append(read_pid_and_telnet(child2, '10.1.1.2'))

cmd = "python watchdog.py -o %s -k %s" % (os.getpid(), ','.join(pidlist))

Popen(cmd.split(' '))

print "I started the watchdog with:\n %s" % cmd

time.sleep(0.5)

raise RuntimeError, "Simulated script crash. Note that script child sessions are hung"

现在举例说明当我运行上述自动化时会发生什么...请注意,PID 30017产生30018,PID 30020产生30021.所有上述PID都是script会话.

[mpenning@Hotcoffee Network]$ python reproduce_bug.py

I started the watchdog with:

python watchdog.py -o 30016 -k 30017,30020

Traceback (most recent call last):

File "reproduce_bug.py", line 35, in

raise RuntimeError, "Simulated script crash. Note that script child sessions are hung"

RuntimeError: Simulated script crash. Note that script child sessions are hung

[mpenning@Hotcoffee Network]$

在我运行上面的自动化之后,所有子script会话仍在运行.

[mpenning@Hotcoffee Models]$ ps auxw | grep script

mpenning 30018 0.0 0.0 15832 508 ? S 12:08 0:00 /usr/bin/script -f LOGFILE1.txt

mpenning 30021 0.0 0.0 15832 516 ? S 12:08 0:00 /usr/bin/script -f LOGFILE2.txt

mpenning 30050 0.0 0.0 7548 880 pts/8 S+ 12:08 0:00 grep script

[mpenning@Hotcoffee Models]$

我在Debian Squeeze linux系统(uname -a :)上运行Python 2.6.6下的自动化Linux Hotcoffee 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux.

题:

似乎守护进程在产生进程崩溃时无法生存.如果自动化程序死亡(如上例所示),如何修复watchdog.py以关闭所有脚本会话？

一个watchdog.py说明问题的日志(遗憾的是,PID与原始问题不一致)......

[mpenning@Hotcoffee ~]$ cat watchdog.log

2012-02-22,15:17:20.356313 Start watchdog.watch_process

2012-02-22,15:17:20.356541 observe pid = 31339

2012-02-22,15:17:20.356643 kill pids = 31352,31356

2012-02-22,15:17:20.356730 seconds = 2

[mpenning@Hotcoffee ~]$

解析度

问题基本上是竞争条件.当我试图杀死"父" script进程时,他们已经死于与自动化事件巧合......

要解决这个问题......首先,监视程序守护程序需要在轮询观察到的PID之前识别要杀死的整个子项列表(我的原始脚本试图在观察到的PID崩溃后识别子项).接下来,我不得不修改我的看门狗守护进程,以允许某些script进程可能因观察到的PID而死亡.

watchdog.py:

#!/usr/bin/python

"""

Implement a cross-platform watchdog daemon, which observes a PID and kills

other PIDs if the observed PID dies.

Example:

--------

watchdog.py -o 29322 -k 29345,29346,29348 -s 2

The command checks PID 29322 every 2 seconds and kills PIDs 29345, 29346, 29348

and their children, if PID 29322 dies.

Requires:

----------

* https://github.com/giampaolo/psutil

* http://pypi.python.org/pypi/python-daemon

"""

from optparse import OptionParser

import datetime as dt

import signal

import daemon

import logging

import psutil

import time

import sys

import os

class MyFormatter(logging.Formatter):

converter=dt.datetime.fromtimestamp

def formatTime(self, record, datefmt=None):

ct = self.converter(record.created)

if datefmt:

s = ct.strftime(datefmt)

else:

t = ct.strftime("%Y-%m-%d %H:%M:%S")

s = "%s,%03d" % (t, record.msecs)

return s

def check_pid(pid):

""" Check For the existence of a unix / windows pid."""

try:

os.kill(pid, 0) # Kill 0 raises OSError, if pid isn't there...

except OSError:

return False

else:

return True

def kill_process(logger, pid):

try:

psu_proc = psutil.Process(pid)

except Exception, e:

logger.debug('Caught Exception ["%s"] while looking up PID %s' % (e, pid))

return False

logger.debug('Sending SIGTERM to %s' % repr(psu_proc))

psu_proc.send_signal(signal.SIGTERM)

psu_proc.wait(timeout=None)

return True

def watch_process(observe, kill, seconds=2):

"""Kill the process IDs listed in 'kill', when 'observe' dies."""

logger = logging.getLogger(__name__)

logger.setLevel(logging.DEBUG)

logfile = logging.FileHandler('%s/watchdog.log' % os.getcwd())

logger.addHandler(logfile)

formatter = MyFormatter(fmt='%(asctime)s %(message)s',datefmt='%Y-%m-%d,%H:%M:%S.%f')

logfile.setFormatter(formatter)

logger.debug('Start watchdog.watch_process')

logger.debug(' observe pid = %s' % observe)

logger.debug(' kill pids = %s' % kill)

logger.debug(' seconds = %s' % seconds)

children = list()

# Get PIDs of all child processes...

for childpid in kill.split(','):

children.append(childpid)

p = psutil.Process(int(childpid))

for subpsu in p.get_children():

children.append(str(subpsu.pid))

# Poll observed PID...

while check_pid(int(observe)):

logger.debug('Poll PID: %s is alive.' % observe)

time.sleep(seconds)

logger.debug('Poll PID: %s is *dead*, starting kills of %s' % (observe, ', '.join(children)))

for pid in children:

# kill all child processes...

kill_process(logger, int(pid))

sys.exit(0) # Exit gracefully

def run(observe, kill, seconds):

with daemon.DaemonContext(detach_process=True,

stdout=sys.stdout,

working_directory=os.getcwd()):

watch_process(observe=observe, kill=kill, seconds=seconds)

if __name__=='__main__':

parser = OptionParser()

parser.add_option("-o", "--observe", dest="observe", type="int",

help="PID to be observed", metavar="INT")

parser.add_option("-k", "--kill", dest="kill",

help="Comma separated list of PIDs to be killed",

metavar="TEXT")

parser.add_option("-s", "--seconds", dest="seconds", default=2, type="int",

help="Seconds to wait between observations (default = 2)",

metavar="INT")

(options, args) = parser.parse_args()

run(options.observe, options.kill, options.seconds)

江啾

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫