python从字符串中提取单词_从字符串中提取单词，删除标点符号并在Python中返回带有分隔单词的列表...-CSDN博客

使用Python的正则表达式库re，可以方便地从字符串中提取单词，忽略标点符号。通过编译`w+`的正则模式，可以找到所有包含字母和数字的单词。如果只需要字母，可以将`w`替换为`[A-Za-z]`。此外，还提供了其他不依赖正则表达式的替代方法来实现相同功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

这与分裂和标点符号无关;你只关心字母(和数字)，只想要一个正则表达式：

import re

def getWords(text)

return re.compile('\w+').findall(text)演示：

>>> re.compile('\w+').findall('Hello world, my name is...James the 2nd!')

['Hello', 'world', 'my', 'name', 'is', 'James', 'the', '2nd']如果您不关心数字，请将\w替换为[A-Za-z]仅用于字母，或将[A-Za-z']替换为包括收缩等。可能有更好的方法将字母非数字字符类(例如带有重音符号的字母)与其他正则表达式包括在内。

我几乎在这里回答了这个问题：Split Strings with Multiple Delimiters?

但是你的问题实际上没有说明：你想把'this is: an example'分成：

['this', 'is', 'an', 'example']

或['this', 'is', 'an', '', 'example']？

我认为这是第一个案例。

[this', 'is', 'an', example'] is what i want. is there a method without importing regex? If we can just replace the non ascii_letters with '', then splitting the string into words in a list, would that work? – James Smith 2 mins ago

正则表达式是最优雅的，但是，你可以这样做如下：

def getWords(text):

"""

Returns a list of words, where a word is defined as a

maximally connected substring of uppercase or lowercase

alphabetic letters, as defined by "a".isalpha()

>>> get_words('Hello world, my name is... Élise!') # works in python3

['Hello', 'world', 'my', 'name', 'is', 'Élise']

"""

return ''.join((c if c.isalnum() else ' ') for c in text).split()或.isalpha()

旁注：您也可以执行以下操作，但需要导入另一个标准库：

from itertools import *