# Regular Expressions, other languages and interpreters

• 定义正则表达式的语言；解释这个语言。
• 定义被1个正则表达式匹配的字符串集合。
• 其他语言。

## Office Hours 3

### Course Syllabus

Lesson 3: Regular Expressions, Other Languages and Interpreters

### 1. Question One

• Design of Computer Programs - YouTube

Hi, and welcome to the third office hours. We’ve got, again, more good questions. Let’s get right at them. >>All right.

The first one comes from Voythos, and Voythos is taking CS262 as well with Wes Weimer. In that class they talked about finding state machines(状态机) as a representation of the underlying(基础的;优先的) engine for regular expressions. In our implementation, is that actually what we did or did we do something different?

(使用有限状态机，作为正则表达式的引擎)
That’s a great question. We didn’t explicitly(明白地,明确地) show that, but there is a one-to-one correspondence(对应;符合) between a regular expression and a finite state machine(有限状态机). And we can post some references(参考), some supplementary(增补的,追加的) material for that. But you can make up a little table of here is a regular expression–say the alt regular expression, a or b–and then here’s a finite state machine. I guess one way to implement that would be you’d have a start state, you’d have two epsilon(希腊语的第五个字母) transitions(过渡,转变) to the a or b, and then two epsilon transitions coming back. So any regular expression corresponds one-to-one to a little collection of nodes in the finite state machine. That means you can go in either direction. Does that mean that we’ve actually implemented that? Well, it kind of depends on how you look at it. Yes, you could say that we’ve implemented it, because you can show what they look like, but we aren’t really creating objects that correspond to these individual states. They’re sort of existing ephemerally(ephemeral 短暂的,瞬息的) when we execute the program rather than being defined exactly ahead of time. In that sense, no. I guess, maybe in the more important sense, the types of manipulations(操作;操纵;控制) we can do with what we implemented are slightly different than what you would do if you explicitly created these finite state machines–particularly this transition between deterministic(确定性的) and nondeterministic machines. That’s a little hard to do exactly in the representation that we had, although you could do something very much like it by starting to memoize(memorize 存储,记忆) your functions and so on. You could arrive at the same place as you would by doing that transition with machines.

Great. If you want an in depth discussion on finite state machines, you can check out the link below this video to Wes’s course.

### 2. Question Two

• Design of Computer Programs - YouTube

(当正则表达式很庞大时，如何处理？)
The next question comes from Luca. Luca wants to know when work with a compiler is really, really heavy and you have a really long regular expression, is there any way of dumping out(dump out 倒出) the final set of these low level machine instructions so they can be culled(cull 挑选,剔除;) later.

Yep. Okay. That’s a great question. A couple answers to that.

(使用re模块中的某个函数将正则表达式先编译)
One is that within the regular expression module, the re module in Python, there is a compile statement that takes a string in and returns a compiled version of that regular expression. So if you’re running your program once, you can compile that regular expression once at the very top of your program then use the compiled expression each time. Now, if you do that explicitly(明白地,明确地), then you’re all set. If you don’t, the regular expression module does most of the work for you, because what it does is it keeps a little cache–it does something like memoize and keeps the last few regular expression that’s done and says, I’ve seen this string before. I know what it compiles to. I’ll just fetch that compiled object. It does that automatically. That’s within one run of your program.

(使用pickle模组)
Now, another thing to think about is between runs of your program. What if you’ve compiled everything, and then you don’t want to have the startup time of compiling it over again? There’s another module called pickle. Well, what are pickles? They’re ways of storing cucumbers(腌黄瓜) for a long time so they don’t get rotten(腐烂的). That’s what the pickle module does. It takes an object that exists within the running Python interpreter and writes it out to disk in a form that can be read back in.

### 3. Question Three

• Design of Computer Programs - YouTube

Next question comes from Thomas, and his question has to do with mistakes. Basically, you never seem to make any–at least in the videos. Maybe you can talk a little bit about your design process in coming up with these lectures and how mistakes figure in there.

There is always a trade off of how many mistakes we want to leave in, and believe me, I make lots of them. And I make them at various points in time. So as I’m first thinking up the questions we’re going to do, I’m coding up answers. And I make mistakes there in a couple of ways. One is, I just make errors. I write something. It computes the wrong answer. It generates an error. I swapped the order of two arguments, or I pass the wrong thing in. I’m making those mistakes all the time. Those aren’t very interesting ones, so I don’t show them. And then, when I’m recording the videos, I make mistakes there too. And, sometimes errors have crept(蹑手蹑脚地走) into the program that I didn’t notice when I was developing them. I didn’t write enough test cases, and then I have to decide what to do. And mostly, those have ended up on the cutting room floor(剪辑工作室). So, just as when you watch a movie or a TV show, you don’t see most of the outtakes(借用镜头). Sometimes after the credits, they roll a few of the outtakes So believe me the errors are there, we’re not just showing most of them.

### 4. Question Four

• Design of Computer Programs - YouTube

(解决问题时，编程范式的选择？)
Next question comes from Eduardo Lopez. He points out that Python is a flexible language. We can use many different programming paradigms. We can do functional, procedural, or office-oriented programming. How do you decide which of these paradigms to use when approaching a new problem?

Yeah, I guess I try to think of things as how can I get as close to the problem as possible? And so I want to program at the level of the problem. And then, incidentally(顺便;附带), I have to program with a particular language. And so I start analyzing the problem and saying, what are the pieces of this problem, what are the objects I’m going to be manipulating, what are the ways I’m going to manipulate(操作,处理;操纵) them, and try to do most of the analysis at that level. And then once that analysis is done, then I can say, well, what do I have in my programming language? And there there might be some differences between languages. So if you’re using Python, you might have more functions. If you’re are using Java, you might have more classes. But they’re still implementing the same basic set of ideas. And I like that approach because there is a more direct connection between the problem and the solution, rather than a multistep of going from the problem to the language implementation and then back to the solution.

All right. Thank you. That’s all we have for this week. See you next week. See you next week.

## 参考文献：

#### 《D o C P》学习笔记（3 - 1）Regular Expressions, other languages and interpreters - Lesson 3

2018-03-04 18:56:19

#### D o C P 学习笔记（3 - 2）Regular Expressions, other languages and interpreters - Problem Set 3

2018-03-13 14:52:48

#### 《D o C P》学习笔记（3 - 0）Regular Expressions, other languages and interpreters - 简介

2018-03-04 17:33:23

#### Regular Languages

2004-09-14 14:27:00

#### c# 正则表达式 Regular Expression(二)

2014-02-21 10:07:49

#### N o v e m b e r 1 4 t h W e d n e s d a y

2007-11-30 21:32:00

#### a​p​a​c​h​e​+​m​o​d​_​j​k

2014年07月30日 4.91MB 下载

#### 正则表达式 Regular Expression

2016-04-05 17:06:59

#### N o v e m b e r 1 9 t h M o n d a y

2007-11-30 21:34:00

#### 编译原理紫龙书(Compilers:Principles,Techniques and Tools Second Edition)部分答案

2010-01-13 21:55:00

## 不良信息举报

D o C P 学习笔记（3 - 3）Regular Expressions, other languages, interpreters - Office Hours 3