python正则表达式面试题,带有utf8问题的python正则表达式

最新推荐文章于 2022-05-03 15:11:02 发布

weixin_39605905

最新推荐文章于 2022-05-03 15:11:02 发布

阅读量120

点赞数

文章标签： python正则表达式面试题

I got a file which includes many lines of plain utf-8 text. Such as below, by the by, it's Chinese.

PROCESS：类型：关爱积分[NOTIFY] 交易号：2012022900000109 订单号：W12022910079166 交易金额：0.01元交易状态：true 2012-2-29 10:13:08

The file itself was saved in utf-8 format. file name is xx.txt

here is my python code, env is python2.7

#coding: utf-8

import re

pattern = re.compile(r'交易金额：(\d+)元')

for line in open('xx.txt'):

match = pattern.match(line.decode('utf-8'))

if match:

print match.group()

The problematic thing here is I got no results.

I wanna get the decimal string from 交易金额：0.01元, in here, which is 0.01.

Why doesn't this code work? Can anyone explain it to me, I got no clue whatsoever.

解决方案

There are several issues with your code. First you should use re.compile(ur''). Also it is nice to add re.UNICODE flag (not sure if really needed here though). Next one is that still you will not receive a match since \d+ doesn't handle decimals just a series of numbers, you should use \d+\.?\d+ instead (you want number, probably a dot and a number). Example code:

#coding: utf-8

text = u"PROCESS：类型：关爱积分[NOTIFY] 交易号：2012022900000109 订单号：W12022910079166 交易金额：0.01元交易状态：true 2012-2-29 10:13:08"

import re

pattern = re.compile(ur'交易金额：(\d+\.?\d+)元', re.UNICODE)

print pattern.search(text).group(1)

weixin_39605905

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式面试题,带有utf8问题的python正则表达式

I got a file which includes many lines of plain utf-8 text. Such as below, by the by, it's Chinese.PROCESS：类型：关爱积分[NOTIFY] 交易号：2012022900000109 订单号：W12022910079166 交易金额：0.01元交易状态：true 2012-...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。