def csv_split() :
raw = [
'"1,2,3" , "4,5,6" , "456,789"',
'"text":"a,b,c,d", "gate":"456,789"'
]
cr = csv.reader( raw, skipinitialspace=True )
for l in cr :
print len( l ), l
This function outputs following:
3 ['1,2,3 ', '4,5,6 ', '456,789']
6 ['text:"a', 'b', 'c', 'd"', 'gate:"456', '789"']
As you can tell, the first line is correctly split into 3 entries.
But the second line is NOT. I would expect the csv reader splits it
into two, instead we've got 6 here. I have also thought about regex
approaches, but it assumes some specific quoting dialect.
Basically what I want is:
just split the string whenever there is a "," that is not quoted in a pair
of "".
Is there any quick and general way to do this? I have seen some regex hacks which
assumes that every filed is ALWAYS quoted etc. I think I can write a small loop
that does this very inefficiently, but would definitely appreciate some more
expertly advice. Thanks a lot!
解决方案
CSV isn't a standardized format, but it's common to escape quotation marks by using two "" if they appear inside the text (e.g. "text"":""a,b,c,d"). Python's CSV reader is doing the right thing here, because it assumes this convention. I'm not quite sure what do you expect as output, but here is my try for a very simple CSV reader which might suit your format. Feel free to adapt it accordingly.
raw = [
'"1,2,3" , "4,5,6" , "456,789"',
'"text":"a,b,c,d", "gate":"456,789"',
'1,2, 3,'
]
for line in raw:
i, quoted, row = 0, False, []
for j, c in enumerate(line):
if c == ',' and not quoted:
row.append(line[i:j].strip())
i = j + 1
elif c == '"':
quoted = not quoted
row.append(line[i:j+1].strip())
for i in range(len(row)):
if len(row[i]) >= 2 and row[i][0] == '"' and row[i][-1] == '"':
row[i] = row[i][1:-1] # remove quotation marks
print row
Output:
['1,2,3', '4,5,6', '456,789']
['text":"a,b,c,d', 'gate":"456,789']
['1', '2', '3', '']