正则表达式的解析简单例子 udacity学习

最新推荐文章于 2023-02-10 23:13:39 发布

tomdyq625

最新推荐文章于 2023-02-10 23:13:39 发布

阅读量808

点赞数

学习了udacity正则表达式解析，程序需要分解，复用思想，逐步加深。希望记录在此加深印象。

简化正则表达式里只包含5类特殊符号

特殊例子匹配值

* a* '',a,aa,...
? a? '',a
. . a,b,c,1,2,3,...
^ ^b b,ba,bb,... 以b开头
$ a$ ba,bba,... 以a结尾
'' '' ''
a a a
ba ba ba

问题：编写两个函数，search和match

search（pattern, text)在text的任意地方找到匹配模式

match(pattern, text)在text的开始位置找到匹配模式

首先把search问题变成match问题，pattern匹配从左到右，逐个匹配

def search(pattern,  text):
    '''Return True if pattern appears anywhere in text'''
    # search function uses match function
    
    if pattern.startswith('^'):
        return match(pattern[1:],  text) #第一个是^，则在text中找剩余模式
    else:
        return match('.*' + pattern,  text) #第一个不是^，因为是在任意位置，所以前面加上'.*'，这样可以匹配任意字符

                                            #相当于任意位置匹配，从而转化为match问题，此处巧妙

由于已经处理了^符号，match中只需处理剩余4个特殊符号

def match(pattern,  text):
    '''
    Return True if pattern appears at the start of text.
    '''
    if pattern == '': #如果pattern是空字符串，则返回真，因为任意字符串都包含空字符串
        return True
    elif pattern == '$': #如果pattern是$，则只有text是空字符串可能，如果是，返回真，否则，返回假
        return (text ==  '')
    elif len(pattern) > 1 and  pattern[1] in '*?': #如果pattern不为'','$'且长度大于1，且pattern[1]在'*?'中
        p,  op,  pat =  pattern[0],  pattern[1],  pattern[2:] #将pattern分为三个部分,p,op,pat
        if op ==  '*':  #如果op为'*'，则按照*规则匹配，可以匹配0个或多个字符
            return match_star(p, pat, text)
        elif op == '?': #如果op为'?', 则按照?规则匹配，可以匹配0个或1个字符
            if match1(p, text) and  match(pat, text[1:]):  #如果?修饰的字符存在，则p与text调用首字符匹配函数，同时将剩余的                                                           #pattern和剩余的text继续匹配，此处用到递归
                return True
            else:      #如果?修饰的字符不存在，则直接将剩余的pattern，pat与text匹配
                return match(pat, text)
    else:  #如果pattern不是以上情况，则模式首字符与text匹配，同时剩余模式和剩余text也匹配，此处又用到递归
        return (match1(pattern[0],  text) and
                match(pattern[1:],  text[1:]))

def match1(p,  text): #首字符匹配
    '''
    Return true if first character of text matches pattern character p.
    '''
    if not text: return False  #如果text为None，则False
    return p == '.' or  p == text[0] #p如果为'.'，则可以匹配任意字符，必然为真，如果p与text[0]相等，则也必然为真。首字符就这                                     #两种情况

def match_star(p, pattern, text): #  *匹配
    '''
    Return true if any number of char p,
    followed by pattern, matches text.
    '''
    return (match(pattern,  text) or #如果*修饰的字符不存在，则直接进行pattern和text匹配 
            (match1(p, text) and     #或者，如果*修饰的字符存在，则首字符匹配，剩余的text仍用*匹配规则
             match_star(p, pattern, text[1:])))

def test():
    assert  search('baa*!',  'Sheep said baaaa!') ==  True
    assert  search('baa*!', 'Sheep said baaaa numbug') == False
    assert  match('baa*!', 'Sheep said baaaa!') == False
    assert  match('baa*!',  'baaaaaaaaa! said the sheep') == True
    assert  search('def', 'abcdefg') == True
    assert  search('def$',  'abcdef') == True
    assert  search('def$',  'abcdefg') == False
    assert  search('^start',  'not the start') == False
    assert  match('start',  'not the start') == False
    assert  match('a*b*c*', 'just anything') == True
    assert  match('x?', 'text') == True
    assert  match('text?', 'text') == True
    assert  match('text?', 'tex') == True
    def words(text): return text.split()
    assert  all(match('aa*bb*cc*$', s)
                for s in words('abc aaabbccc aaaabcccc'))
    assert  not any(match('aa*bb*cc*$', s)
                    for s in words('ac aaabbcccd aaaa-b-cccc'))
    assert  all(search('^ab.*aca.*a$', s)
                for s in words('abracadabra abacaa about-acacia-fa'))
    assert  all(search('t.p', s)
                for s in words('tip top tap atypical tepid stop'))
    assert  not any(search('t.p', s)
                    for s in words('TYPE teepee tp'))
    return 'test passes'

tomdyq625

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
正则表达式的解析简单例子 udacity学习

学习了udacity正则表达式解析，程序需要分解，复用思想，逐步加深。希望记录在此加深印象。简化正则表达式里只包含5类特殊符号特殊例子匹配值 * a* '',a,aa,... ? a? '',a . . a,b,c,1,2,3,... ^ ^b b,ba,bb,... 以b开头 $ a$ b
复制链接

扫一扫