我的第一个python程序_我的第一个Python程序 - 词法分析器

weixin_39955732

于 2020-11-28 22:17:24 发布

阅读量64

点赞数

文章标签：我的第一个python程序

Hello,

I started to write a lexer in Python -- my first attempt to do something

useful with Python (rather than trying out snippets from tutorials). It

is not complete yet, but I would like some feedback -- I''m a Python

newbie and it seems that, with Python, there is always a simpler and

better way to do it than you think.

### Begin ###

import re

class Lexer(object):

def __init__( self, source, tokens ):

self.source = re.sub( r"\r?\n|\r\n", "\n", source )

self.tokens = tokens

self.offset = 0

self.result = []

self.line = 1

self._compile()

self._tokenize()

def _compile( self ):

for name, regex in self.tokens.iteritems():

self.tokens[name] = re.compile( regex, re.M )

def _tokenize( self ):

while self.offset < len( self.source ):

for name, regex in self.tokens.iteritems():

match = regex.match( self.source, self.offset )

if not match: continue

self.offset += len( match.group(0) )

self.result.append( ( name, match, self.line ) )

self.line += match.group(0).count( "\n" )

break

else:

raise Exception(

''Syntax error in source at offset %s'' %

str( self.offset ) )

def __str__( self ):

return "\n".join(

[ "[L:%s]\t[O:%s]\t[%s]\t''%s''" %

( str( line ), str( match.pos ), name, match.group(0) )

for name, match, line in self.result ] )

# Test Example

source = r"""

Name: "Thomas", # just a comment

Age: 37

"""

tokens = {

''T_IDENTIFIER'' : r''[A-Za-z_][A-Za-z0-9_]*'',

''T_NUMBER'' : r''[+-]?\d+'',

''T_STRING'' : r''"(?:\\.|[^\\"])*"'',

''T_OPERATOR'' : r''[=:,;]'',

''T_NEWLINE'' : r''\n'',

''T_LWSP'' : r''[ \t]+'',

''T_COMMENT'' : r''(?:\#|//).*$'' }

print Lexer( source, tokens )

### End ###

Greetings,

Thomas

Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!

(Coluche)

解决方案'' }

print Lexer( source, tokens )

### End ###

Greetings,

Thomas

Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!

(Coluche)

Thomas Mlynarczyk

Hello,

I started to write a lexer in Python -- my first attempt to do

something useful with Python (rather than trying out snippets from

tutorials). It is not complete yet, but I would like some feedback --

I''m a Python newbie and it seems that, with Python, there is always a

simpler and better way to do it than you think.

Hi,

Adding to John''s comments, I wouldn''t have source as a member of the

Lexer object but as an argument of the tokenise() method (which I would

make public). The tokenise method would return what you currently call

self.result. So it would be used like this.

>>mylexer = Lexer(tokens)

mylexer.tokenise(source)

# Later:

>>mylexer.tokenise(another_source)

Arnaud

Arnaud Delobelle schrieb:

Adding to John''s comments, I wouldn''t have source as a member of the

Lexer object but as an argument of the tokenise() method (which I would

make public). The tokenise method would return what you currently call

self.result. So it would be used like this. >>>mylexer = Lexer(tokens)

mylexer.tokenise(source)

mylexer.tokenise(another_source)

At a later stage, I intend to have the source tokenised not all at once,

but token by token, "just in time" when the parser (yet to be written)

accesses the next token:

token = mylexer.next( ''FOO_TOKEN'' )

if not token: raise Exception( ''FOO token expected.'' )

# continue doing something useful with token

Where next() would return the next token (and advance an internal

pointer) *if* it is a FOO_TOKEN, otherwise it would return False. This

way, the total number of regex matchings would be reduced: Only that

which is expected is "tried out".

But otherwise, upon reflection, I think you are right and it would

indeed be more appropriate to do as you suggest.

Thanks for your feedback.

Greetings,

Thomas

Ce n''est pas parce qu''ils sont nombreux à avoir tort qu''ils ont raison!

(Coluche)

weixin_39955732

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫