我需要一个正则表达式,它将从包含一年的文本中提取句子.
示例文字:
Next, in 1988 the Bradys were back
again for a holiday celebration, “A
Very Brady Christmas”. Susan Olsen
(Cindy) would be missing from this
reunion, Jennifer Runyon took her
place. This was a two hour movie in
which the Bradys got together to
celebrate Christmas, introducing the
world to the spouses and children of
the Brady kids. This movie was the
highest rated TV-movie of 1988.
如果示例文本是变量$string,我需要它返回:
> $sentenceWithYear [0] =接下来,1988年
布拉迪斯又回来了
节日庆典,“非常布雷迪
圣诞”.
> $sentenceWithYear [1] =这部电影
是评价最高的电视电影
1988年.
如果可以通过正则表达式保留年份,我会在句子中使用年份并最终将句子插入到数据库中,如:
INSERT INTO table_name(年,句)VALUES(‘$year’,’$sentenceWithYear [x]’)
解决方法:
(这不是答案,而是一个建议)
我想你是想让这个太复杂了.你真的有两个问题:
>将一个段落分成句子
>确定哪些句子包含4位数字,可能在1900-2100左右.
点#1非常困难,因为使用模糊不清.字符.例如,你将如何处理句子:
I was born in 1986. Mr. Smith was born in 1976.
你需要能够认识到“先生”之后的句号不是终止字符的句子,并且实际上有两个句子.你得到的大多数答案(包括@Tatu’s)都会根据这段时间进行天真的分割.
编辑另一个用例:钱
I earned $42.00 yesterday that I don’t have to report on my 2010 tax return.
一旦你能够充分识别句子,第2点就是微不足道的.
标签:php,regex
来源: https://codeday.me/bug/20190726/1546547.html