I'm using import fileinput in a Python script running on an Ubuntu box.
I'm running the script on the command line with something along the lines of python myscript.py firstinputfile.txt secondinputfile.txt and inside myscript.py I am using for line in fileinput.input() to iterate over the lines. The problem I'm running into is that firstinputfile.txt and secondinputfile.txt both use Macintosh (\r) line endings, and fileinput.input() does not seem to be recognizing \r as a line delimiter.
Is there any way to force fileinput to recognize \r as a line delimiter?
I've considered preprocessing firstinputfile.txt and secondinputfile.txt to use \n line endings, but am hesitant for two reasons: i) I don't really want to emit additional files to manage and ii) I still want the input to fileinput to come from file arguments (not stdin after piping commands) so I can use fileinput.filename() and fileinput.filelineno().
Any suggestions?
解决方案
It turns out fileinput.input() supports an optional openhook parameter:
You can control how files are opened by providing an opening hook via
the openhook parameter to fileinput.input() or FileInput(). The hook
must be a function that takes two arguments, filename and mode, and
returns an accordingly opened file-like object. Two useful hooks are
already provided by this module.
Furthermore, the universal newline support document suggests that a file can be open to support Windows/Unix/Macintosh newlines with the rU mode:
Opening a file with the mode 'U' or 'rU' will open a file for reading
in universal newline mode. All three line ending conventions will be
translated to a "\n" in the strings returned by the various file
methods such as read() and readline().
So, you can write a little function to pass as the openhook argument that will open the file in a manner which supports universal newlines:
def univ_file_read(name, mode):
# WARNING: ignores mode argument passed to this function
return open(name, 'rU')
Then, instead of:
for line in fileinput.input():
Use:
for line in fileinput.input(openhook=univ_file_read):
This seems to have done the trick for me, and \r is being recognized as a line delimiter now.