Python: 去掉字符串中的非数字(或非字母)字符
>>> crazystring = ‘dade142.;!0142f[.,]ad’
只保留数字
>>> filter(str.isdigit, crazystring)
‘1420142′
只保留字母
>>> filter(str.isalpha, crazystring)
‘dadefad’
只保留字母和数字
>>> filter(str.isalnum, crazystring)
‘dade1420142fad’
>>> filter(lambda ch: ch in ‘0123456789.’, crazystring)
‘142.0142.’
python删除所有的中文字符、非ASCII或非英文字符,检查字符串是否包含非ASCII
Your ''.join()
expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:
return ''.join([i if ord(i) < 128 else ' ' for i in text])
This handles characters one by one and would still use one space per character replaced.
Your regular expression should just replace consecutive non-ASCII characters with a space:
re.sub(r'[^\x00-\x7F]+',' ', text)
re.sub(r'[^\x00-\x7f]', ' ', str)
Note the +
there.
检查字符串是否包含非英文ASCII等:
a = "ds dl,;sd!@)~`09历史s" regexp = re.compile(r'[^\x00-\x7f]') if regexp.search(a): print('matched')