Python 3
Python 3现在已经发布了'foo'.isidentifier(),就是说这似乎是最近Python版本的最佳解决方案:import keyword
def isidentifier(ident: str) -> bool:
"""Determines if string is valid Python identifier."""
if not isinstance(ident, str):
raise TypeError("expected str, but got {!r}".format(type(ident)))
if not ident.isidentifier():
return False
if keyword.iskeyword(ident):
return False
return True
Python 2
对于Python 2来说,检查给定字符串是否有效的最简单可能的方法是Python标识符让Python自己解析它。
首先是使用tokenize模块并检查令牌流以仅包含我们的名称:import keyword
import tokenize
def isidentifier(ident):
"""Determines if string is valid Python identifier."""
# Smoke test — if it's not string, then it's not identifier, but we don't
# want to just silence exception. It's better to fail fast.
if not isinstance(ident, str):
raise TypeError("expected str, but got {!r}".format(type(ident)))
# Quick test — if string is in keyword list, it's definitely not an ident.
if keyword.iskeyword(ident):
return False
readline = (lambda: (yield ident.encode('utf-8-sig')))().__next__
tokens = list(tokenize.tokenize(readline))
# You should get exactly 3 tokens
if len(tokens) != 3:
return False
# First one is ENCODING, it's always utf-8 because we explicitly passed in
# UTF-8 BOM with ident.
if tokens[0].type != tokenize.ENCODING:
return False
# Second is NAME, identifier.
if tokens[1].type != tokenize.NAME:
return False
# Name should span all the string, so there would be no whitespace.
if ident != tokens[1].string:
return False
# Third is ENDMARKER, ending stream
if tokens[2].type != tokenize.ENDMARKER:
return False
return True
然后是使用ast,并检查单个表达式的AST是否具有所需的形状:import ast
def isidentifier(ident):
"""Determines, if string is valid Python identifier."""
# Smoke test — if it's not string, then it's not identifier, but we don't
# want to just silence exception. It's better to fail fast.
if not isinstance(ident, str):
raise TypeError("expected str, but got {!r}".format(type(ident)))
# Resulting AST of simple identifier is >]>
try:
root = ast.parse(ident)
except SyntaxError:
return False
if not isinstance(root, ast.Module):
return False
if len(root.body) != 1:
return False
if not isinstance(root.body[0], ast.Expr):
return False
if not isinstance(root.body[0].value, ast.Name):
return False
if root.body[0].value.id != ident:
return False
return True
可以这样测试来检查它的工作原理:assert(isidentifier('foo'))
assert(isidentifier('foo1_23'))
assert(not isidentifier('pass')) # syntactically correct keyword
assert(not isidentifier('foo ')) # trailing whitespace
assert(not isidentifier(' foo')) # leading whitespace
assert(not isidentifier('1234')) # number
assert(not isidentifier('1234abc')) # number and letters
assert(not isidentifier('👻')) # Unicode not from allowed range
assert(not isidentifier('')) # empty string
assert(not isidentifier(' ')) # whitespace only
assert(not isidentifier('foo bar')) # several tokens
assert(not isidentifier('no-dashed-names-for-you')) # no such thing in Python
# Unicode identifiers are only allowed in Python 3:
assert(isidentifier('℘᧚')) # Unicode $Other_ID_Start and $Other_ID_Continue