python删除注释_删除Python注释/docstring的脚本

我是“我的上帝”的作者,他用regex编写了一个python解释器。。。“(即pyminifier)提到的at that link below=)。

我只是想插一句,说我已经使用记号赋予器模块对代码进行了很大的改进(由于这个问题,我发现了这个模块)。

您会很高兴地注意到,代码不再那么依赖于正则表达式,并且使用了标记器,效果非常好。无论如何,这是来自pyminifier的remove_comments_and_docstrings()函数

(注意:它可以正确地处理以前发布的代码中断的边缘案例):import cStringIO, tokenize

def remove_comments_and_docstrings(source):

"""

Returns 'source' minus comments and docstrings.

"""

io_obj = cStringIO.StringIO(source)

out = ""

prev_toktype = tokenize.INDENT

last_lineno = -1

last_col = 0

for tok in tokenize.generate_tokens(io_obj.readline):

token_type = tok[0]

token_string = tok[1]

start_line, start_col = tok[2]

end_line, end_col = tok[3]

ltext = tok[4]

# The following two conditionals preserve indentation.

# This is necessary because we're not using tokenize.untokenize()

# (because it spits out code with copious amounts of oddly-placed

# whitespace).

if start_line > last_lineno:

last_col = 0

if start_col > last_col:

out += (" " * (start_col - last_col))

# Remove comments:

if token_type == tokenize.COMMENT:

pass

# This series of conditionals removes docstrings:

elif token_type == tokenize.STRING:

if prev_toktype != tokenize.INDENT:

# This is likely a docstring; double-check we're not inside an operator:

if prev_toktype != tokenize.NEWLINE:

# Note regarding NEWLINE vs NL: The tokenize module

# differentiates between newlines that start a new statement

# and newlines inside of operators such as parens, brackes,

# and curly braces. Newlines inside of operators are

# NEWLINE and newlines that start new code are NL.

# Catch whole-module docstrings:

if start_col > 0:

# Unlabelled indentation means we're inside an operator

out += token_string

# Note regarding the INDENT token: The tokenize module does

# not label indentation inside of an operator (parens,

# brackets, and curly braces) as actual indentation.

# For example:

# def foo():

# "The spaces before this docstring are tokenize.INDENT"

# test = [

# "The spaces before this string do not get a token"

# ]

else:

out += token_string

prev_toktype = token_type

last_col = end_col

last_lineno = end_line

return out

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值