tl;dr ANSWER: Don't use numpy. Use csv.writer instead of numpy.savetxt.
I'm new to Python and NumPy. It seems like it shouldn't be so difficult to save a 2D array of strings (that contain commas) to a CSV file, but I can't get it to work the way I want.
Let's say I have an array that looks like this (made from a list of lists):
[['text1, text2', 'text3'],
['text4', 'text5']]
I want a CSV file that looks like this (or without quote characters) in Excel (pipe = cell separator):
'text1, text2' | 'text3'
'text4' | 'text5'
I'm using numpy.savetxt(filename, array, fmt="%s"), and I get the following CSV output (with square brackets):
['text1, text2','text3']
['text4','text5']
Which displays in Excel like this:
['text1 | text2' | 'text3']
['text4' | 'text5']
I tried fussing with the savetxt delimiter argument, but no change in output.
Do I need to do this manually? If so, let me know if there are any shortcuts I should be aware of.
Ultimately, I need to import the CSV into a Postgresql database. I'm not completely clear on exactly what the CSV formatting needs to be for this to work as expected, but I'm assuming if it looks wrong in Excel, it will probably end up messed up in Postgres. The Postgres documentation says:
The values in each record are separated by the DELIMITER character. If
the value contains the delimiter character, the QUOTE character, the
NULL string, a carriage return, or line feed character, then the whole
value is prefixed and suffixed by the QUOTE character, and any
occurrence within the value of a QUOTE character or the ESCAPE
character is preceded by the escape character. You can also use
FORCE_QUOTE to force quotes when outputting non-NULL values in
specific columns.
Thanks!
++++++++++++++++++++++++++++
Real input and output, in case it's relevantly different:
array:
[['8908232', 'Plant Growth Chamber Facility at the Department of Botany, University of Wisconsin-Madison', 'DBI', 'INSTRUMENTAT & INSTRUMENT DEVP', '1/1/90', '12/19/89', 'WI', 'Standard Grant', 'Joann P. Roskoski', '12/31/91', '$94,914.00 ', 'BIO', '1108', '', '$0.00 ']]
CSV output:
['8908232', 'Plant Growth Chamber Facility at the Department of Botany, University of Wisconsin-Madison', 'DBI', 'INSTRUMENTAT & INSTRUMENT DEVP', '1/1/90', '12/19/89', 'WI', 'Standard Grant', 'Joann P. Roskoski', '12/31/91', '$94,914.00 ', 'BIO', '1108', '', '$0.00 ']
Excel's version:
['8908232' 'Plant Growth Chamber Facility at the Department of Botany University of Wisconsin-Madison' 'DBI' 'INSTRUMENTAT & INSTRUMENT DEVP' '1/1/90' '12/19/89' 'WI' 'Standard Grant' 'Joann P. Roskoski' '12/31/91' '$94 914.00 ' 'BIO' '1108' '' '$0.00 ']
解决方案
Adding fmt="%s" doesn't put quotes around each field—the quotes are part of the Python string literal for the string %s, and %s just says that any value should be formatted as a string. If you want to force quotes around everything, you need to have quotes in the format string, like fmt='"%s"'.
However, even if you don't do that, the line you showed can't possibly produce the output you showed. There is no way that NumPy is changing your commas into pipe characters, or using pipe characters as delimiters. The only you can get that is by adding delimiter=' |'. And if you add that… it works with no changes, and you get this:
text1, text2 | text3
text4 | text5
So whatever your actual problem is, it can't be the one you described.
Meanwhile, if you're trying to write CSV files for non-numeric data as flexibly as possible, the standard library's csv module is much more powerful than NumPy. The advantage of NumPy—as the name implies—is in dealing with numeric data. Here's how to do it with csv:
with open(filename, 'wb') as f:
csv.writer(f).writerows(array)
This will default to , as a delimiter. Since some of your strings have , characters in them, by default, it will quote those strings. But you can configure the quoting/escaping behavior, the quote character, the delimiter, and all kinds of other things that NumPy can't.