python多线程输出_关于多线程：python：读取线程中的子进程输出-CSDN博客

我有一个使用subprocess.Popen调用的可执行文件。然后，我打算使用一个线程从stdin提供一些数据，该线程从Queue中读取其值，该队列随后将填充到另一个线程中。应该在另一个线程中使用stdout管道读取输出，然后再次在Queue中对其进行排序。

据我以前的研究了解，将线程与Queue一起使用是一种很好的做法。

不幸的是，外部可执行文件不会很快为我提供每条管道输入的答案，因此，简单的写入，读取行周期不是一个选择。该可执行文件实现了一些内部多线程处理，我希望输出尽快可用，因此需要附加的读取器线程。

作为测试可执行文件的示例，将只对每行(shuffleline.py)进行随机排序：

#!/usr/bin/python -u

import sys

from random import shuffle

for line in sys.stdin:

line = line.strip()

# shuffle line

line = list(line)

shuffle(line)

line ="".join(line)

sys.stdout.write("%s

"%(line))

sys.stdout.flush() # avoid buffers

请注意，这已经尽可能地没有缓冲。还是不是？这是我精简的测试程序：

#!/usr/bin/python -u

import sys

import Queue

import threading

import subprocess

class WriteThread(threading.Thread):

def __init__(self, p_in, source_queue):

threading.Thread.__init__(self)

self.pipe = p_in

self.source_queue = source_queue

def run(self):

while True:

source = self.source_queue.get()

print"writing to process:", repr(source)

self.pipe.write(source)

self.pipe.flush()

self.source_queue.task_done()

class ReadThread(threading.Thread):

def __init__(self, p_out, target_queue):

threading.Thread.__init__(self)

self.pipe = p_out

self.target_queue = target_queue

def run(self):

while True:

line = self.pipe.readline() # blocking read

if line == '':

break

print"reader read:", line.rstrip()

self.target_queue.put(line)

if __name__ =="__main__":

cmd = ["python","-u","./shuffleline.py"] # unbuffered

proc = subprocess.Popen(cmd, bufsize=0, stdin=subprocess.PIPE, stdout=subprocess.PIPE)

source_queue = Queue.Queue()

target_queue = Queue.Queue()

writer = WriteThread(proc.stdin, source_queue)

writer.setDaemon(True)

writer.start()

reader = ReadThread(proc.stdout, target_queue)

reader.setDaemon(True)

reader.start()

# populate queue

for i in range(10):

source_queue.put("string %s

" %i)

source_queue.put("")

print"source_queue empty:", source_queue.empty()

print"target_queue empty:", target_queue.empty()

import time

time.sleep(2) # expect some output from reader thread

source_queue.join() # wait until all items in source_queue are processed

proc.stdin.close() # should end the subprocess

proc.wait()

这给出以下输出(python2.7)：

writing to process: 'string 0

writing to process: 'string 1

writing to process: 'string 2

writing to process: 'string 3

writing to process: 'string 4

writing to process: 'string 5

writing to process: 'string 6

source_queue empty: writing to process: 'string 7

writing to process: 'string 8

writing to process: 'string 9

writing to process: ''

True

target_queue empty: True

然后两秒钟什么都没有...

reader read: rgsn0i t

reader read: nrg1sti

reader read: tis n2rg

reader read: snt gri3

reader read: nsri4 tg

reader read: stir5 gn

reader read: gnri6ts

reader read: ngrits7

reader read: 8nsrt ig

reader read: sg9 nitr

预计开始时会发生交错。但是，子流程的输出直到子流程结束后才出现。随着更多管道的插入，我得到了一些输出，因此我假设了stdout管道中的缓存问题。根据此处发布的其他问题，至少在Linux上，刷新stdout(在子进程中)应该有效。

您的问题与subprocess模块无关，也与线程无关(实际上是有问题的)，甚至与子进程和线程的混合无关(一个非常糟糕的主意，甚至比使用线程开头还差，除非您使用的是backport您可以从code.google.com/p/python-subprocess32获取Python 3.2的子过程模块，或者从多个线程访问相同的内容(就像您的print语句一样)。

发生的是您的shuffleline.py程序缓冲区。不是在输出中，而是在输入中。尽管不是很明显，但是当您遍历文件对象时，Python将读取块，通常为8k字节。由于sys.stdin是文件对象，因此for循环将一直缓冲到EOF或完整块为止：

for line in sys.stdin:

line = line.strip()

....

如果您不想这样做，请使用while循环调用sys.stdin.readline()(对于EOF，此返回xx6)：

while True:

line = sys.stdin.readline()

if not line:

break

line = line.strip()

...

或使用iter()的两个参数形式，这将创建一个迭代器，该迭代器将调用第一个参数，直到返回第二个参数("前哨")为止：

for line in iter(sys.stdin.readline, ''):

line = line.strip()

...

如果我不建议不要为此使用线程，而是在子进程的管道上使用非阻塞I / O，或者甚至是twisted.reactor.spawnProcess之类的东西，它也有很多将进程和其他事物作为消费者连接在一起的方式，那我也会很失落和生产者。

谢谢，那就是解决方案！

请问为什么子进程和线程的混合是一种如此糟糕的方法？这似乎比什么都没有发生时一次又一次地调用非阻塞I / O更为优雅。显然，线程不应访问任何非线程安全的数据结构，而只是从Queue读写数据似乎是安全的。对于像我这样的简单案例，Python3.2反向移植中的更改是否重要？

线程和子进程的问题特别是线程和派生的混合问题。请参阅linuxprogrammingblog.com/和其他此类文章。 Python 3.2子流程反向移植可解决这些问题。对于一般的线程，主要的问题是它们很难控制和调试。例如，您无法从线程"外部"杀死它们，因此，如果线程卡在读取或写入中，则您无能为力。