python 匹配中文和英文

最新推荐文章于 2023-01-17 14:41:50 发布

weixin_33873846

最新推荐文章于 2023-01-17 14:41:50 发布

阅读量631

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/chybot/p/4665389.html

版权

在处理文本时经常会匹配中文名或者英文word，python中可以在utf-8编码下方便的进行处理。

中文unicode编码范围[\u4e00-\u9fa5]

英文字符编码范围[a-zA-Z]

此时匹配连续的中文或者英文就很方便了，例如：

>>> import re
>>> strings = u'中国china美国American'
>>> print strings
中国china美国American
>>> ch_pat = re.compile(ur'[\u4e00-\u9fa5]+')
>>> en_pat = re.compile('[a-zA-Z]+')
>>> ch_words = ch_pat.findall(strings)
>>> en_words = en_pat.findall(strings)
>>> print ch_words
[u'\u4e2d\u56fd', u'\u7f8e\u56fd']
>>> print en_words
[u'china', u'American']

转载于:https://www.cnblogs.com/chybot/p/4665389.html