正则表达式的例题分析

司空昆颉

已于 2022-04-09 11:44:46 修改

阅读量332

点赞数 2

文章标签： python 正则表达式

于 2021-04-16 19:11:15 首次发布

本文链接：https://blog.csdn.net/weixin_53657630/article/details/115769651

版权

本文通过分析一道关于Petya自创语言的题目，探讨如何利用正则表达式进行语法检查。题目描述了Petya语言的词法规则，并提供输入输出示例。作者鼓励读者思考正则表达式在该问题中的必要性及可能的陷阱，并提供了代码实现。

摘要由CSDN通过智能技术生成

读者你好，下面的例题是我在学习Python时遇到的一道我自认为还蛮有启发的一道题目。本题目不太需要算法（小白放轻松），只需要对正则表达式有一个大概的了解——希望你在阅读完我的文章后能够进一步提升对正则表达式的理解以及运用能力。
话不多说，直接上题目：

A grammer lesson

Petya got interested in grammar on his third year in school. He invented his own language called Petya’s. Petya wanted to create a maximally simple language that would be enough to chat with friends, that’s why all the language’s grammar can be described with the following set of rules:

There are three parts of speech: the adjective, the noun, the verb. Each word in his language is an adjective, noun or verb.

There are two genders: masculine and feminine. Each word in his language has gender either masculine or feminine.

*Masculine adjectives end with -lios, and feminine adjectives end with -liala.*

*Masculine nouns end with -etr, and feminime nouns end with -etra.*

*Masculine verbs end with -initis, and feminime verbs end with -inites.*

Thus, each word in the Petya’s language has one of the six endings, given above. There are no other endings in Petya’s language.

It is accepted that the whole word consists of an ending. That is, words “lios”, “liala”, “etr” and so on belong to the Petya’s language.

There aren’t any punctuation marks, grammatical tenses, singular/plural forms or other language complications.

A sentence is either exactly one valid language word or exactly one statement

Statement is any sequence of the Petya’s language, that satisfy both conditions:

Words in statement follow in the following order (from the left to the right): zero or more adjectives followed by exactly one noun followed by zero or more verbs.

All words in the statement should have the same gender.

After Petya’s friend Vasya wrote instant messenger (an instant messaging program) that supported the Petya’s language, Petya wanted to add spelling and grammar checking to the program. As Vasya was in the country and Petya didn’t feel like waiting, he asked you to help him with this problem. Your task is to define by a given sequence of words, whether it is true that the given text represents exactly one sentence in Petya’s language.

## Input

The first line contains one or more words consisting of lowercase Latin letters. The overall number of characters (including letters and spaces) does not exceed 105.

It is guaranteed that any two consecutive words are separated by exactly one space and the input data do not contain any other spaces. It is possible that given words do not belong to the Petya’s language.

## Output

If some word of the given text does not belong to the Petya’s language or if the text contains more that one sentence, print “NO” (without the quotes). Otherwise, print “YES” (without the quotes).

### input

petr

###output

YES

### input

etis atis animatis etis atis amatis

### output

### input

nataliala kataliala vetra feinites

### output

YES
（需要12个具体test的同学可以私信我）
看完你有没有产生需要重学英语的想法^ v ^？（希望没有这种想法）

在看我给出的具体代码之前，我希望你可以思考以下几个问题：
1、本题是否真的需要正则表达式？如果可以不用，我可以怎么做？
2、本题使用正则表达式有什么优越之处？
3、如果让我来写，我会怎么使用正则表达式？
4、本题是否有陷阱？有哪些可能会出错的地方？
代码如下：

import re


def isPetyaLanguage():
    s = input().split()
    join_words = "".join(s)
    if s[0] == join_words and (re.match(".*lios", join_words) is not None or re.match(".*etr", join_words) is not None or re.match(".*initis", join_words) is not None or re.match(".*liala", join_words) is not None or re.match(".*etra", join_words) is not None or re.match(".*inites", join_words) is not None):
        print("YES")
        return
    join1 = re.search("(.*lios)*.*?etr(.*[^a]initis)*", join_words)
    join2 = re.search("(.*liala)*.*?etra(.*inites)*", join_words)
    if join1 is None and join2 is None:
        print("NO")
    else:
        if join1 is not None and join1.group() == join_words:
            print("YES")
        elif join2 is not None and join2.group() == join_words:
            print("YES")
        else:
            print("NO")


if __name__ == '__main__':
    isPetyaLanguage()

我不清楚你阅读完我写的源代码的感受，如果你是大佬，你一定会觉得还有改进的空间（欢迎在评论区讨论）；如果你有正则表达式的基础，你也许会会心一笑。
那么，再来看看下面的注释版，看看你是否思考到了我在里面提到的一些问题。

import re


def isPetyaLanguage():
    s = input().split()
    join_words = "".join(s)
    # 判断单个单词是否符合，这里涉及一个陷阱：单个单词是不用分词性的。
    if s[0] == join_words and (re.match(".*lios", join_words) is not None or re.match(".*etr", join_words) is not None or re.match(".*initis", join_words) is not None or re.match(".*liala", join_words) is not None or re.match(".*etra", join_words) is not None or re.match(".*inites", join_words) is not None):
        print("YES")
        return
    join1 = re.search("(.*lios)*.*?etr(.*[^a]initis)*", join_words)	
    join2 = re.search("(.*liala)*.*?etra(.*inites)*", join_words)	
    # 中间用懒惰模式，思考：为什么贪婪模式不行？
    # 解答：lios和etr中间夹杂的非法单词会被认为是etr的一部分，从而导致误判。
    if join1 is None and join2 is None:
        print("NO")
    else:
        # 这里必须还要分别明确join1、join2不为None，因为if语句只能保证两者不是都为None。
        # 否则编译器会报错，认为‘Nonetype’没有group()
        if join1 is not None and join1.group() == join_words:	
            print("YES")
        elif join2 is not None and join2.group() == join_words:
            print("YES")
        else:
            print("NO")


if __name__ == '__main__':
    isPetyaLanguage()