In the python subprocess module, if we wanted to run the shell command
foo | grep bar
from within python, we might use
p1 = Popen(["foo"], stdout = PIPE)
p2 = Popen(["grep", "bar"], stdin = p1.stdout, stdout = PIPE)
p1.stdout.close()
output = p2.communicate()[0]
I'm confused about the line p1.stdout.close(). If you'll forgive me, I'll trace through how I think the program works, and the error will hopefully reveal itself.
It seems to me that when the line output = p2.communicate()[0] is enacted by python, python tries to call p2, it recognizes that it needs output from p1. So it calls p1, which executes foo and throws the output on the stack so that p2 can finish executing. And then p2 finishes.
But nowhere in this trace does p1.stdout.close() actually happen. So what is actually happening? It seems to me that this ordering of lines might matter too, so that the following wouldn't work:
p1 = Popen(["foo"], stdout = PIPE)
p1.stdout.close()
p2 = Popen(["grep", "bar"], stdin = p1.stdout, stdout = PIPE)
output = p2.communicate()[0]
And that's the status of my understanding.
解决方案
p1.stdout.close() is necessary for foo to detect when the pipe is broken e.g., when p2 exits prematurely.
If there is no p1.stdout.close() then p1.stdout remains open in the parent process and even if p2 exits; p1 won't know that nobody reads p1.stdout i.e., p1 will continue to write to p1.stdout until the corresponding OS pipe buffer is full and then it just blocks forever.
To emulate foo | grep bar shell command without a shell:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(['grep', 'bar'], stdin=PIPE) as grep, \
Popen(['foo'], stdout=grep.stdin):
grep.communicate()