How can I split a line in Python at a non-printing ascii character (such as the long minus sign hex 0x97 , Octal 227)?
I won't need the character itself. The information after it will be saved as a variable.
解决方案
You can use re.split.
>>> import re
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
Adjust the pattern to only include the characters you want to keep.
Example (w/ the long minus):
>>> # \xe2\x80\x93 represents a long dash (or long minus)
>>> s = 'hello – world'
>>> s
'hello \xe2\x80\x93 world'
>>> import re
>>> re.split("\xe2\x80\x93", s)
['hello ', ' world']
Or, the same with unicode:
>>> # \u2013 represents a long dash, long minus or so called en-dash
>>> s = u'hello – world'
>>> s
u'hello \u2013 world'
>>> import re
>>> re.split(u"\u2013", s)
[u'hello ', u' world']