正则表达式介绍——基于python

最新推荐文章于 2024-11-14 15:49:21 发布

瑞小球

最新推荐文章于 2024-11-14 15:49:21 发布

阅读量206

点赞数 1

分类专栏：数据结构文章标签：正则表达式 python 数据结构

本文链接：https://blog.csdn.net/weixin_42470516/article/details/124495890

版权

数据结构专栏收录该内容

1 篇文章 0 订阅

订阅专栏

"本文介绍了Python中的正则表达式基础语法，包括常用符号的意义和使用方法，如`.`、`[]`、`d`、`w`、`s`等，并提供了多个实际案例，如科学名称验证、计算器表达式识别和罗马数字检测。通过这些案例，读者能更好地理解和掌握正则表达式的运用。"

摘要由CSDN通过智能技术生成

本blog基于UC Berkeley课程CS61A

正则表达式介绍

正则表达式是不依赖于任何语言的，通用的declarative language[生命性语言]

主要语法

符号	含义	使用说明
.(dot)	可匹配任意一个字符	如banana可以匹配r’.a.a.a’
[]	匹配方括号集合内的任意一个字符	如t可以匹配[r’top]’
\d	匹配任意一个数字字符，等价于[0-9]	如12可以匹配r’\d\d’
\w	匹配任意一个字符，等价于[0-9A-Za-z]	如4Z可以匹配r’\d\w’
\s	匹配任意一个空白字符，如spaces、tabs、line breaks	如9 a可以匹配r’\d\s\w’
*	将前面的模块匹配 $\geq0$ 次	如aaa可以匹配r’a*’
+	将前面的模块匹配 $\geq1$ 次	如lool可以匹配r’lo+l’
?	将前面的模块匹配0或1次	如lol可以匹配r’lo?l’
{}	用法为{min, max}，将前面的模块匹配 $\geq$ min、 $\leq$ max次	如：aa匹配r’a{2}‘、aaaaa可以匹配r’a{2,}’、aaa可以匹配r’a{2,4}’
\|	匹配其中之一	如Inf可以匹配r’\d+
()	无论括号内的字符是什么，匹配括号内的字符	如< 3< 3< 3可以匹配r’(< 3)+’
^	匹配一个字符串的开头	如aww可以匹配r’aw+’
$	匹配一个字符串的末尾	如stay可以匹配r’\w+y$’
\b	匹配一个word的边界	如bridge可以匹配r’\w+e\b’

python案例

在python中re模块可以支持处理字符串，在课程中有三个小case，这里使用re模块进行实现.

识别正确的引用

def scientific_name(name):
    """
    Returns True for strings that are in the correct notation for scientific names;
    i.e. contains a capital letter followed by a period or lowercase letters, 
    followed by a space, followed by more lowercase letters. Returns False for 
    invalid strings.

    >>> scientific_name("T. rex")
    True
    >>> scientific_name("t. rex")
    False
    >>> scientific_name("tyrannosurus rex")
    False
    >>> scientific_name("t rex")
    False
    >>> scientific_name("Falco peregrinus")
    True
    >>> scientific_name("F peregrinus")
    False
    >>> scientific_name("Annie the F. peregrinus")
    False
    >>> scientific_name("I want a pet T. rex right now")
    False
    """
    return bool(re.search(r'^[A-Z]([a-z]+|[.])\s[a-z]+$', name))

识别正确的运算格式

def calculator_ops(calc_str):
    """
    Returns True if an expression from the Calculator language that has two
    numeric operands exists in calc_str, False otherwise.

    >>> calculator_ops("(* 2 4)")
    True
    >>> calculator_ops("(+ (* 3 (+ (* 2 4) (+ 3 5))) (+ (- 10 7) 6))")
    True
    >>> calculator_ops("(* 2)")
    False
    >>> calculator_ops("(/ 8 4 2)")
    False
    >>> calculator_ops("(- 8 3)")
    True
    >>> calculator_ops("+ 3 23")
    False
    """
    return bool(re.search(r'^\(([-+*/]\s+\d+\s+\d+)\)', calc_str))

识别正确的罗马数字

def roman_numerals(text):
    """
    Returns True if any string of letters that could be a Roman numeral
    (made up of the letters I, V, X, L, C, D, M) is found. Returns False otherwise.

    >>> roman_numerals("Sir Richard IIV, can you tell Richard VI that Richard IV is on the phone?")
    True
    >>> roman_numerals("My TODOs: I. Groceries II. Learn how to count in Roman IV. Profit")
    True
    >>> roman_numerals("I. Act 1 II. Act 2 III. Act 3 IV. Act 4 V. Act 5")
    True
    >>> roman_numerals("Let's play Civ VII")
    True
    >>> roman_numerals("i love vi so much more than emacs.")
    False
    >>> roman_numerals("she loves ALL editors equally.")
    False
    """
    return bool(re.search(r'\b([IVXLCDM]+)\b', text))