python 找出日期_Python-在字符串中查找日期

您可以在文本的所有子文本上运行日期解析器并选择第一个日期。当然,这样的解决方案要么捕捉到不是日期的东西,要么捕捉不到是日期的东西,或者很可能两者都是。

让我提供一个使用^{}捕获任何类似日期的内容的示例:import dateutil.parser

from itertools import chain

import re

# Add more strings that confuse the parser in the list

UNINTERESTING = set(chain(dateutil.parser.parserinfo.JUMP,

dateutil.parser.parserinfo.PERTAIN,

['a']))

def _get_date(tokens):

for end in xrange(len(tokens), 0, -1):

region = tokens[:end]

if all(token.isspace() or token in UNINTERESTING

for token in region):

continue

text = ''.join(region)

try:

date = dateutil.parser.parse(text)

return end, date

except ValueError:

pass

def find_dates(text, max_tokens=50, allow_overlapping=False):

tokens = filter(None, re.split(r'(\S+|\W+)', text))

skip_dates_ending_before = 0

for start in xrange(len(tokens)):

region = tokens[start:start + max_tokens]

result = _get_date(region)

if result is not None:

end, date = result

if allow_overlapping or end > skip_dates_ending_before:

skip_dates_ending_before = end

yield date

test = """Adelaide was born in Finchley, North London on 12 May 1999. She was a

child during the Daleks' abduction and invasion of Earth in 2009.

On 1st July 2058, Bowie Base One became the first Human colony on Mars. It

was commanded by Captain Adelaide Brooke, and initially seemed to prove that

it was possible for Humans to live long term on Mars."""

print "With no overlapping:"

for date in find_dates(test, allow_overlapping=False):

print date

print "With overlapping:"

for date in find_dates(test, allow_overlapping=True):

print date

代码的结果是,毫不奇怪,垃圾,不管你是否允许重叠。如果允许重叠,会得到很多看不见的日期;如果不允许重叠,则会错过文本中的重要日期。With no overlapping:

1999-05-12 00:00:00

2009-07-01 20:58:00

With overlapping:

1999-05-12 00:00:00

1999-05-12 00:00:00

1999-05-12 00:00:00

1999-05-12 00:00:00

1999-05-03 00:00:00

1999-05-03 00:00:00

1999-07-03 00:00:00

1999-07-03 00:00:00

2009-07-01 20:58:00

2009-07-01 20:58:00

2058-07-01 00:00:00

2058-07-01 00:00:00

2058-07-01 00:00:00

2058-07-01 00:00:00

2058-07-03 00:00:00

2058-07-03 00:00:00

2058-07-03 00:00:00

2058-07-03 00:00:00

基本上,如果允许重叠:“1999年5月12日”解析为1999-05-12 00:00:00

“1999年5月”解析为1999-05-03 00:00:00(因为今天是一个月的第三天)

但是,如果不允许重叠,“2009年。2058年7月1日”被解析为2009-07-01 20:58:00,并且不尝试解析该期间之后的日期。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值