Java实现英文段落分句_Java按句点分割段落[重复](Java split a paragraph by periods [duplicate])...

Java按句点分割段落[重复](Java split a paragraph by periods [duplicate])

这个问题在这里已有答案:

我正在尝试构建一个正则表达式,将句子分隔成句点( . )分隔的句子。 这应该工作:

String str[] = text.split("\\.");

但是我需要添加最小的健壮性,例如检查句点后跟space和大写字母。 所以这是我的下一个猜测:

String text="The pen is on the table. The table has a pen upon it.";

String arr[] = text.split("\\. [A-Z]");

for (String s: arr)

System.out.println(s);

Output:

The pen is on the table

he table has a pen upon it.

不幸的是,我错过了这段时间后的第一个角色。 你能看到它的修复方式吗?

I'm trying to build up a regular expression which splits a paragraph in sentences separated by a period (.). That should work:

String str[] = text.split("\\.");

However I'd need to add a minimum of robustness, for example checking that the period is followed by a space and an uppercase letter. So here's my next guess:

String text="The pen is on the table. The table has a pen upon it.";

String arr[] = text.split("\\. [A-Z]");

for (String s: arr)

System.out.println(s);

Output:

The pen is on the table

he table has a pen upon it.

Unfortunately, I'm missing the first character after the period. Can you see any way it can be fixed?

原文:https://stackoverflow.com/questions/48749256

更新时间:2020-02-15 02:27

最满意答案

您可以使用前瞻来查看字符串中接下来会发生什么。

text.split("\\. (?=[A-Z])");

{ "The pen is on the table", "The table has a pen upon it." }

如果你想保持时期,你也可以使用lookbehind:

text.split("(?<=\\.) (?=[A-Z])");

{ "The pen is on the table.", "The table has a pen upon it." }

You can use a lookahead to see what is coming next in the string.

text.split("\\. (?=[A-Z])");

{ "The pen is on the table", "The table has a pen upon it." }

If you want to keep the periods as well, you can also use a lookbehind:

text.split("(?<=\\.) (?=[A-Z])");

{ "The pen is on the table.", "The table has a pen upon it." }

2018-02-12

相关问答

我不熟悉Ingres中的日期函数。 让我假设-得到两个日期之间的差异。 如果数据中没有重叠,那么您可以非常轻松地执行您想要的操作。 如果没有间隙,则最小和最大日期之间的差异与每条线上的差异总和相同。 如果差异大于0,则存在间隙。 所以: select ref,

((max(to_date) - min(from_date)) -

sum(to_date - from_date)

) as total_gaps

from t

group by ref;

...

var span = $('°').get(0);

$('p, p *').each(function (index,element) {

$.each(element.childNodes, function (index,node) {

if (node.nodeType == 3) {

$.each(node.nodeValue.split('.'), function (index,f

...

这很有可能更好地使用nltk处理( 安装正确 ,那是): from nltk.tokenize import sent_tokenize

string = "This is a sentence. This is another. And here one another, same line, starting with space. this sentence starts with lowercase letter. Here is a site you may know: google.

...

在PHP中,句点是连接运算符。 将句点放入PHP $modarrayout "mod/"连接到$modarrayout ,然后将结果字符串连接到"/bar.php" 。 看这个页面: http://www.php.net/manual/en/language.operators.string.php In PHP, the period is the concatentation operator. Putting the periods in tells PHP to concatenate "

...

一般而言,您的方法是正确的,iText7的布局足够灵活,可以让您轻松完成所需的任务。 我看到的唯一不清楚的地方就是Paragraph实际上是一个不能自我分割的元素,布局框架中的任何类都不利于元素拆分。 你可以手动做,但没有必要。 相反,您应该直接使用IRenderer和ParagraphRenderer 。 IRenderer可以将其自身作为layout操作的结果进行分割,并且仅与包含完整数据的Paragraph相比才表示数据的必要部分。 您可以将一个IRenderer添加到CanvasRende

...

您可以使用前瞻来查看字符串中接下来会发生什么。 text.split("\\. (?=[A-Z])");

{ "The pen is on the table", "The table has a pen upon it." }

如果你想保持时期,你也可以使用lookbehind: text.split("(?<=\\.) (?=[A-Z])");

{ "The pen is on the table.", "The table has a pen upon it." }

You can us

...

\b匹配ABC.123期间的ABC.123 。 您可以更改它以避免这种情况。 例如: (?

给出完整的引用表达式: @"(?

您可能希望将#()=>

...

var arrOfPtags = document.getElementsByTagName("p");

for (var i = 0;i < arrOfPtags.length; i++){

arrOfPtags[i].setAttribute("desired_attribute", "value");

}

var arrOfPtags = document.getElementsByTagName("p");

for (var i = 0;i < arrOfPtags.lengt

...

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值