我需要将类Sentence解析为单词和标点符号(空格被视为标点符号),然后将其全部添加到一般ArrayList< Sentence>中.
一个例句:
A man, a plan, a canal — Panama!
A => word
whitespase => punctuation
man => word
, + space => punctuation
a => word
[…]
我试着一次读完整个句子一个字符并收集相同的内容并从这个集合中创建新单词或新标点符号.
这是我的代码:
public class Sentence {
private String sentence;
private LinkedList elements;
/**
* Constructs a sentence.
* @param aText a string containing all characters of the sentence
*/
public Sentence(String aText) {
sentence = aText.trim();
splitSentence();
}
public String getSentence() {
return sentence;
}
public LinkedList getElements() {
return elements;
}
/**
* Split sentance into words and punctuations
*/
private void splitSentence() {
if (sentence == "" || sentence == null || sentence == "\n") {
return;
}
StringBuilder builder = new StringBuilder();
int j = 0;
boolean mark = false;
while (j < sentence.length()) {
//char current = sentence.charAt(j);
while (Character.isLetter(sentence.charAt(j))) {
if (mark) {
elements.add(new Punctuation(builder.toString()));
builder.setLength(0);
mark = false;
}
builder.append(sentence.charAt(j));
j++;
}
mark = true;
while (!Character.isLetter(sentence.charAt(j))) {
if (mark) {
elements.add(new Word(builder.toString()));
builder.setLength(0);
mark = false;
}
builder.append(sentence.charAt(j));
j++;
}
mark = true;
}
}
但splitSentence()的逻辑无法正常工作.我无法找到合适的解决方案.
我想在我们读取第一个字符=>时实现这一点添加到builder =>直到下一个元素是相同的类型(字母或标点符号)继续添加到builder =>当下一个元素与builder =>的内容不同时创建新单词或标点符号并设置构建器以启动.
再次做同样的逻辑.
如何以正确的方式实现这种检查逻辑?