解决re.error: bad escape \x at position xxx (line xz, column xz)异常

最新推荐文章于 2024-07-05 08:50:36 发布

北房有佳人

最新推荐文章于 2024-07-05 08:50:36 发布

阅读量7.8k

点赞数 6

分类专栏：日常bug 文章标签：正则表达式 re.sub

原文链接：https://www.cnblogs.com/ercilan/p/14640436.html

版权

日常bug 专栏收录该内容

12 篇文章 2 订阅

订阅专栏

问题

我先把定义放在这：re.sub(pattern, repl, string, count=0, flags=0)。
出现这个问题的时候，我搜索了一下，结合我的情况：我的 pattern 是没有进行错误的转义的。可能出错的就是在 repl 里。翻看源代码：

def sub(pattern, repl, string, count=0, flags=0):
    """Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the Match object and must return
    a replacement string to be used."""
    return _compile(pattern, flags).sub(repl, string, count)

分析

re.sub(pattern, repl, string, count=0, flags=0) 会对 repl 内的反斜杠进行转义，这是一个容易被忽略、值得注意的问题。

一方面 sub 通过转义反斜杠，实现反向引用。另一方面却可能会影响我们单纯字面意义上的反斜杠，比如 \ 会被最终转义成 \，如果我们处理的是代码字符串，就会导致问题出现，例如：

import re

# 直接将整个字符串替换
origin = "/*  好的  */"
to = r"('\\', 0)"
print("想要替换成：", to)
print("实际上替换成：", re.sub(r"/*  好的  */", to, origin))
想要替换成： ('\\', 0)
实际上替换成： ('\', 0)

如果要替换成的字符串 repl 再复杂一些，可能会产生一些错误的转义，抛出形如异常：

re.error: bad escape \x at position 86013 (line 1575, column 7966)

看到一种方法：import regex as re，可以跳过这个异常，但是实际上也没有完成我们的最初替换需求。（需要下载新的库，比较麻烦）

解决

可以手动再次转义 repl 字符串，但是实际上这个工作得不偿失。

结合源代码，repl 可以是字符串也是可调用的，比如函数。如果是字符串就会对反斜杠进行转义。如果是函数，会被传递 Match 对象，函数需要返回要替换成的字符串。

因此我们就可以自定义一个函数来解决 re.sub 自动转义的问题：

import re

origin = "/*dfe1683替换区  \n好的   */"
to = r"('\\', 0)"
print("想要替换成:", to)
print("实际上替换成：", re.sub(r"/\*dfe1683替换区[\s\S]+?\*/", lambda m: to, origin))
想要替换成: ('\\', 0)
实际上替换成： ('\\', 0)