字符串去掉所有符号

sophie1314179

于 2023-08-16 10:04:17 发布

阅读量196

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/sophie1314179/article/details/132312486

版权

在数据对比的时候，有的时候需要对比文章或者段落等，但是标点符号也会成为比对的内容，本次主要是去掉对比内容的标点符号后的字符提取

import re
def remove_punctuation(sentence):
    # 使用正则表达式匹配所有标点符号，并替换为空格
    sentence = re.sub(r'[^\w\s]', '', sentence)
    return sentence

ss = "hello~ world!"
print(remove_punctuation(ss))

结果：hello world

这样的话，就可以让两个字符串取对比了

import re
def remove_punctuation(sentence):
    # 使用正则表达式匹配所有标点符号，并替换为空格
    sentence = re.sub(r'[^\w\s]', '', sentence)
    return sentence

ss = "hello~ world!"
ss1 = "hello world~~~"
ss2 = "hello word!"

import Levenshtein
def levenshtein_similarity(text1, text2):
    distance = Levenshtein.distance(text1, text2)
    max_length = max(len(text1), len(text2))
    similarity = 1 - distance / max_length
    return similarity

print(levenshtein_similarity(ss,ss1))
print(levenshtein_similarity(remove_punctuation(ss),remove_punctuation(ss1)))
print(levenshtein_similarity(ss,ss2))
print(levenshtein_similarity(remove_punctuation(ss),remove_punctuation(ss2)))