python正则为什么有r,Python正则表达式-R前缀

Can anyone explain why example 1 below works, when the r prefix is not used?

I thought the r prefix must be used whenever escape sequences are used.

Example 2 and example 3 demonstrate this.

# example 1

import re

print (re.sub('\s+', ' ', 'hello there there'))

# prints 'hello there there' - not expected as r prefix is not used

# example 2

import re

print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))

# prints 'hello there' - as expected as r prefix is used

# example 3

import re

print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello there there'))

# prints 'hello there there' - as expected as r prefix is not used

解决方案

Because \ begin escape sequences only when they are valid escape sequences.

>>> '\n'

'\n'

>>> r'\n'

'\\n'

>>> print '\n'

>>> print r'\n'

\n

>>> '\s'

'\\s'

>>> r'\s'

'\\s'

>>> print '\s'

\s

>>> print r'\s'

\s

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

Escape Sequence Meaning Notes

\newline Ignored

\\ Backslash (\)

\' Single quote (')

\" Double quote (")

\a ASCII Bell (BEL)

\b ASCII Backspace (BS)

\f ASCII Formfeed (FF)

\n ASCII Linefeed (LF)

\N{name} Character named name in the Unicode database (Unicode only)

\r ASCII Carriage Return (CR)

\t ASCII Horizontal Tab (TAB)

\uxxxx Character with 16-bit hex value xxxx (Unicode only)

\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only)

\v ASCII Vertical Tab (VT)

\ooo Character with octal value ooo

\xhh Character with hex value hh

Never rely on raw strings for path literals, as raw strings have some rather peculiar inner workings, known to have bitten people in the ass:

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase "n". String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

To better illustrate this last point:

>>> r'\'

SyntaxError: EOL while scanning string literal

>>> r'\''

"\\'"

>>> '\'

SyntaxError: EOL while scanning string literal

>>> '\''

"'"

>>>

>>> r'\\'

'\\\\'

>>> '\\'

'\\'

>>> print r'\\'

\\

>>> print r'\'

SyntaxError: EOL while scanning string literal

>>> print '\\'

\

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值