python判断字符串里的字符,在Python中,如何检查字符串是否只包含某些字符?

In Python, how to check if a string only contains certain characters?

I need to check a string containing only a..z, 0..9, and . (period) and no other character.

I could iterate over each character and check the character is a..z or 0..9, or . but that would be slow.

I am not clear now how to do it with a regular expression.

Is this correct? Can you suggest a simpler regular expression or a more efficient approach.

#Valid chars . a-z 0-9

def check(test_str):

import re

#http://docs.python.org/library/re.html

#re.search returns None if no position in the string matches the pattern

#pattern to search for any character other then . a-z 0-9

pattern = r'[^\.a-z0-9]'

if re.search(pattern, test_str):

#Character other then . a-z 0-9 was found

print 'Invalid : %r' % (test_str,)

else:

#No character other then . a-z 0-9 was found

print 'Valid : %r' % (test_str,)

check(test_str='abcde.1')

check(test_str='abcde.1#')

check(test_str='ABCDE.12')

check(test_str='_-/>"!@#12345abcde

'''

Output:

>>>

Valid : "abcde.1"

Invalid : "abcde.1#"

Invalid : "ABCDE.12"

Invalid : "_-/>"!@#12345abcde

'''

解决方案

Final(?) edit

Answer, wrapped up in a function, with annotated interactive session:

>>> import re

>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):

... return not bool(search(strg))

...

>>> special_match("")

True

>>> special_match("az09.")

True

>>> special_match("az09.\n")

False

# The above test case is to catch out any attempt to use re.match()

# with a `$` instead of `\Z` -- see point (6) below.

>>> special_match("az09.#")

False

>>> special_match("az09.X")

False

>>>

Note: There is a comparison with using re.match() further down in this answer. Further timings show that match() would win with much longer strings; match() seems to have a much larger overhead than search() when the final answer is True; this is puzzling (perhaps it's the cost of returning a MatchObject instead of None) and may warrant further rummaging.

==== Earlier text ====

The [previously] accepted answer could use a few improvements:

(1) Presentation gives the appearance of being the result of an interactive Python session:

reg=re.compile('^[a-z0-9\.]+$')

>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')

True

but match() doesn't return True

(2) For use with match(), the ^ at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^

(3) Should foster the use of raw string automatically unthinkingly for any re pattern

(4) The backslash in front of the dot/period is redundant

(5) Slower than the OP's code!

prompt>rem OP's version -- NOTE: OP used raw string!

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import

re;reg=re.compile(r'[^a-z0-9\.]')" "not bool(reg.search(t))"

1000000 loops, best of 3: 1.43 usec per loop

prompt>rem OP's version w/o backslash

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import

re;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"

1000000 loops, best of 3: 1.44 usec per loop

prompt>rem cleaned-up version of accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import

re;reg=re.compile(r'[a-z0-9.]+\Z')" "bool(reg.match(t))"

100000 loops, best of 3: 2.07 usec per loop

prompt>rem accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import

re;reg=re.compile('^[a-z0-9\.]+$')" "bool(reg.match(t))"

100000 loops, best of 3: 2.08 usec per loop

(6) Can produce the wrong answer!!

>>> import re

>>> bool(re.compile('^[a-z0-9\.]+$').match('1234\n'))

True # uh-oh

>>> bool(re.compile('^[a-z0-9\.]+\Z').match('1234\n'))

False

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值