python中正则表达式匹配年份,使用Python中的正则表达式匹配日期?

I know that there are similar questions to mine that have been answered, but after reading through them I still don't have the solution I'm looking for.

Using Python 3.2.2, I need to match "Month, Day, Year" with the Month being a string, Day being two digits not over 30, 31, or 28 for February and 29 for February on a leap year. (Basically a REAL and Valid date)

This is what I have so far:

pattern = "(January|February|March|April|May|June|July|August|September|October|November|December)[,][ ](0[1-9]|[12][0-9]|3[01])[,][ ]((19|20)[0-9][0-9])"

expression = re.compile(pattern)

matches = expression.findall(sampleTextFile)

I'm still not too familiar with regex syntax so I may have characters in there that are unnecessary (the [,][ ] for the comma and spaces feels like the wrong way to go about it), but when I try to match "January, 26, 1991" in my sample text file, the printing out of the items in "matches" is ('January', '26', '1991', '19').

Why does the extra '19' appear at the end?

Also, what things could I add to or change in my regex that would allow me to validate dates properly? My plan right now is to accept nearly all dates and weed them out later using high level constructs by comparing the day grouping with the month and year grouping to see if the day should be <31,30,29,28

Any help would be much appreciated including constructive criticism on how I am going about designing my regex.

解决方案

Here's one way to make a regular expression that will match any date of your desired format (though you could obviously tweak whether commas are optional, add month abbreviations, and so on):

years = r'((?:19|20)\d\d)'

pattern = r'(%%s) +(%%s), *%s' % years

thirties = pattern % (

"September|April|June|November",

r'0?[1-9]|[12]\d|30')

thirtyones = pattern % (

"January|March|May|July|August|October|December",

r'0?[1-9]|[12]\d|3[01]')

fours = '(?:%s)' % '|'.join('%02d' % x for x in range(4, 100, 4))

feb = r'(February) +(?:%s|%s)' % (

r'(?:(0?[1-9]|1\d|2[0-8])), *%s' % years, # 1-28 any year

r'(?:(29), *((?:(?:19|20)%s)|2000))' % fours) # 29 leap years only

result = '|'.join('(?:%s)' % x for x in (thirties, thirtyones, feb))

r = re.compile(result)

print result

Then we have:

>>> r.match('January 30, 2001') is not None

True

>>> r.match('January 31, 2001') is not None

True

>>> r.match('January 32, 2001') is not None

False

>>> r.match('February 32, 2001') is not None

False

>>> r.match('February 29, 2001') is not None

False

>>> r.match('February 28, 2001') is not None

True

>>> r.match('February 29, 2000') is not None

True

>>> r.match('April 30, 1908') is not None

True

>>> r.match('April 31, 1908') is not None

False

And what is this glorious regexp, you may ask?

>>> print result

(?:(September|April|June|November) +(0?[1-9]|[12]\d|30), *((?:19|20)\d\d))|(?:(January|March|May|July|August|October|December) +(0?[1-9]|[12]\d|3[01]), *((?:19|20)\d\d))|(?:February +(?:(?:(0?[1-9]|1\d|2[0-8]), *((?:19|20)\d\d))|(?:(29), *((?:(?:19|20)(?:04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))|2000))))

(I initially intended to do a tongue-in-cheek enumeration of the possible dates, but I basically ended up hand-writing that whole gross thing except for the multiples of four, anyway.)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值