python中标点符号大全_从python中的列表中删除标点符号

最新推荐文章于 2024-03-16 17:48:55 发布

weixin_39937635

最新推荐文章于 2024-03-16 17:48:55 发布

阅读量679

点赞数

文章标签： python中标点符号大全

看一下get_text()，看来我们需要修改一些内容才能删除任何标点符号。我在这里添加了一些评论。

def get_text():

str_lines = [] # create an empty list

url = 'http://www.gutenberg.org/files/1155/1155-h/1155-h.htm'

r = requests.get(url)

data = r.text

soup = BeautifulSoup(data, 'html.parser')

text = soup.find_all('p') #finds all of the text between

i=0

for p in text:

i+=1

line = p.get_text()

if (i<10):

continue

str_lines.append(line) # append the current line to the list

return str_lines # return the list of lines

首先，我取消注释了您的str_lines变量并将其设置为空列表。接下来，我用代码替换了print语句，将该行附加到行列表中。最后，我更改了return语句以返回该行列表。

对于strip_text()，我们可以将其缩减为几行代码：

def strip_text():

list_words = get_text()

list_words = [re.sub("[^a-zA-Z]", " ", s.lower()) for s in list_words]

return list_words

不需要按字词操作，因为我们可以查看整行并删除所有标点符号，因此我删除了split()。使用列表推导，我们可以在一行中更改列表的每个元素，并且还将lower()方法放在那里以压缩代码。

要实现@AhsanulHaque提供的答案，您只需要用它替换strip_text()方法的第二行，如下所示：

def strip_text():

list_words = get_text()

list_words = ["".join(j.lower() for j in i if j not in string.punctuation)

for i in list_words]

return list_words

为了好玩，以下是我之前提到的为Python 3.x实现的translate方法，如here所述：

def strip_text():

list_words = get_text()

translator = str.maketrans({key: None for key in string.punctuation})

list_words = [s.lower().translate(translator) for s in list_words]

return list_words

不幸的是，我无法为你的特定代码计算任何时间，因为Gutenberg暂时阻止了我（我猜想代码运行得太快了。）

weixin_39937635

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。