(How to Write a (Lisp) Interpreter (in Python))

 

(How to Write a (Lisp) Interpreter (in Python))

This page has two purposes: to describe how to implement computer language interpreters in general, and in particular to show how to implement a subset of the Scheme dialect of Lisp using Python. I call my language and interpreter Lispy (lis.py). Years ago, I showed how to write a Scheme interpreter in Java as well as one in Common Lisp. This time around the goal is to demonstrate, as concisely and accessibly as possible, what Alan Kay called "Maxwell's Equations of Software."

Why does this matter? As Steve Yegge said, "If you don't know how compilers work, then you don't know how computers work." Yegge describes 8 problems that can be solved with compilers (or equally with interpreters, or with Yegge's typical heavy dosage of cynicism).

Syntax and Semantics of Scheme Programs

The syntax of a language is the arrangement of characters to form correct statements or expressions; the semantics is the meaning of those statements or expressions. For example, in the language of mathematical expressions (and in many programming languages), the syntax for adding one plus two is "1 + 2" and the semantics is the application of the addition operator to the two numbers, yielding the value 3. We say we are evaluating an expression when we determine its value; we would say that "1 + 2" evaluates to 3, and write that as "1 + 2" ⇒ 3.

Scheme syntax is different from most other languages you may be familiar with. Consider:

Java        Scheme
 
if (x.val() > 0) {
  fn(A[i] + 1,
     new String[] {"one", "two"});
}
  (if (> (val x) 0)
    (fn (+ (aref A i) 1)
        (quote (one two)))

Java has a wide variety of syntactic conventions (keywords, infix operators, brackets, operator precedence, dot notation, quotes, commas, semicolons, etc.), but Scheme syntax is much simpler:

  • Scheme programs consist solely of expressions. There is no statement/expression distinction.
  • Numbers (e.g. 1) and symbols (e.g. A) are called atomic expressions; they cannot be broken into pieces. These are similar to their Java counterparts, except that in Scheme, + and < and the like are symbols, exactly like A.
  • Everything else is a list expression. A list is a "(", followed by zero or more expressions, followed by a ")". The first element of the list determines what it means.
  • A list expression starting with a keyword, e.g. (if ...), is known as a special form; we will see how each special form is interpreted.
  • A list starting with a non-keyword, e.g. (fn ...), is a function call.

The beauty of Scheme is that the full language only needs six basic special forms. (In comparison, Python has 110 syntactic forms and Java has 133.) Using parentheses for everything may seem unfamiliar, but it has the virtues of simplicity and consistency. (Some have joked that "Lisp" stands for "Lots of Irritating Silly Parentheses"; I think it stand for "Lisp Is Syntactically Pure".)

In this page we will cover all the important points of Scheme (omitting some minor details), but we will take two steps to get there.

Language 1: Lispy Calculator

Step one is to define a language I call Lispy Calculator that is a subset of Scheme using only three of the six special forms. Lispy Calculator lets you do any computation you could do on a typical calculator—as long as you are comfortable with prefix notation. And you can do some things that are not offered in typical calculator languages: "if" expressions, and the definition of new variables, for example. Here is a table of all the allowable expressions in the Lispy Calculator language:

Expression Syntax Semantics and Example
variable reference var A symbol is interpreted as a variable name; its value is the variable's value.
Example: r10 (assuming r was previously defined to be 10)
constant literal number A number evaluates to itself.
Examples: 12 ⇒ 12 or -3.45e+6 ⇒ -3.45e+6
quotation (quote exp) Return the exp literally; do not evaluate it.
Example: (quote (+ 1 2)) ⇒ (+ 1 2)
conditional (if test conseq alt) Evaluate test; if true, evaluate and return conseq; otherwise alt.
Example: (if (> 10 20) (+ 1 1) (+ 3 3)) ⇒ 6
definition (define var exp) Define a new variable and give it the value of evaluating the expression exp.
Examples: (define r 10)
procedure call (proc arg...) If proc is anything other than one of the symbols if, define, or quote then it is treated as a procedure. Evaluate proc and all the args, and then the procedure is applied to the list of arg values.
Example: (sqrt (* 2 8)) ⇒ 4.0

In the Syntax column of this table, var must be a symbol, number must be an integer or floating point number, and the other italicized words can be any expression. The notation arg... means zero or more repetitions of arg.

What A Language Interpreter Does

A language interpreter has two parts:

  1. Parsing: The parsing component takes an input program in the form of a sequence of characters, verifies it according to the syntactic rules of the language, and translates the program into an internal representation. In a simple interpreter the internal representation is a tree structure (often called an abstract syntax tree) that closely mirrors the nested structure of statements or expressions in the program. In a language translator called a compiler there is often a series of internal representations, starting with an abstract syntax tree, and progressing to a sequence of instructions that can be directly executed by the computer. The Lispy parser is implemented with the function parse.

     

  2. Execution: The internal representation is then processed according to the semantic rules of the language, thereby carrying out the computation. Lispy's execution function is called eval (note this shadows Python's built-in function of the same name).

Here is a picture of the interpretation process:

program (str) ➡ parse ➡ abstract syntax tree (list) ➡ eval ➡ result (object)

And here is a short example of what we want parse and eval to be able to do:

>> program ="(begin (define r 10) (* pi (* r r)))"

>>> parse(program)
['begin',['define','r',10],['*','pi',['*','r','r']]]

>>> eval(parse(program))
314.1592653589793

Parsing: parse, tokenize and read_from_tokens

Parsing is traditionally separated into two parts: lexical analysis, in which the input character string is broken up into a sequence of tokens, and syntactic analysis, in which the tokens are assembled into an abstract syntax tree. The Lispy tokens are parentheses, symbols, and numbers. There are many tools for lexical analysis (such as Mike Lesk and Eric Schmidt's lex), but we'll use a very simple tool: Python's str.split. The function tokenize takes as input a string of characters; it adds spaces around each paren, and then calls str.split to get a list of tokens:

def tokenize(chars):
    "Convert a string of characters into a list of tokens."
    return chars.replace('(',' ( ').replace(')',' ) ').split()
>>> program ="(begin (define r 10) (* pi (* r r)))"
>>> tokenize(program)
['(','begin','(','define','r','10',')','(','*','pi','(','*','r','r',')',')',')']

Our function parse will take a string representation of a program as input, call tokenize to get a list of tokens, and then call read_from_tokens to assemble an abstract syntax tree. read_from_tokens looks at the first token; if it is a ')' that's a syntax error. If it is a '(', then we start building up a list of sub-expressions until we hit a matching ')'. Any non-parenthesis token must be a symbol or number. We'll let Python make the distinction between them: for each non-paren token, first try to interpret it as an int, then as a float, and finally as a symbol. Here is the parser:

def parse(program):
    "Read a Scheme expression from a string."
    return read_from_tokens(tokenize(program))

def read_from_tokens(tok
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值