这个问题不太好描述,因此还原一下场景:
看下面的邮件,这封邮件是对之前一封邮件的回复,因此在内容上就把之前邮件的内容也附加上了,那如果想只取本次邮件内容,该怎么做呢?
笔者在JavaMail API和邮件协议上都没有找到好的解决办法,有对邮件协议深刻了解的同学可以赐教,本文通过对内容分析,结构整理,“总结”出一套解决方案,但方案并不完美。
1. 原始内容用blockquote标签包裹
实现即代码中的remove1
2. 原始内容用includetail标签包裹
实现即代码中的remove2
3. 邮件初始内容是纯文本不含有html标签
实现即 代码中的remove0
4. 通过与原始内容的连接点关键词
如“发件人”分析连接点,去除后面的内容,实现即代码中的remove3
完整代码如下
public String getSimpleBodyText() {
if (this.bodyText != null) {
return remove(bodyText);
}
return bodyText;
}
public static String remove(final String content) {
String content0 = content;
content0 = remove1(content0);
content0 = remove2(content0);
if (content.equals(content0)) {
content0 = remove0(content0);
}
content0 = remove3(content0);
return content0;
}
public static String remove1(String content) {
int index1 = content.indexOf("<blockquote");
int index2 = content.lastIndexOf("blockquote>");
if (index1 != -1 && index2 != -1) {
logger.debug("remove1-blockquote:" + index1 + "," + index2);
return content.substring(0, index1) + content.substring(index2 + "blockquote>".length());
}
return content;
}
public static String remove0(String content) {
if (!content.trim().startsWith("<")) {
logger.debug("remove0:");
return content.substring(0, content.indexOf("<"));
}
return content;
}
public static String remove2(String content) {
int index1 = content.indexOf("<includetail");
int index2 = content.lastIndexOf("includetail>");
if (index1 != -1 && index2 != -1) {
logger.debug("remove2-includetail:" + index1 + "," + index2);
return content.substring(0, index1) + content.substring(index2 + "includetail>".length());
}
return content;
}
public static String remove3(String content) {
int index1 = -1;
int index2 = -1;
try {
Parser parser = new Parser(content);
NodeFilter pFilter = new TagNameFilter("div");
NodeList nodeList = parser.parse(pFilter);
SimpleNodeIterator elements = nodeList.elements();
while (elements.hasMoreNodes()) {
Node node = elements.nextNode();
String html = node.toHtml();
if (node.toString().contains("WordSection1")) {
index2 = node.getStartPosition() + html.length();
continue;
}
if (node.toString().contains("Section1")) {
index2 = node.getStartPosition() + html.length();
continue;
}
if (node.toString().contains("mailContentContainer")) {
index2 = node.getStartPosition() + html.length();
continue;
}
if (html.contains("发件人") || html.contains("From")) {
if (node.getStartPosition() > 0) {
index1 = node.getStartPosition();
if (index2 == -1) {
if (node.getParent() != null && node.getParent().getLastChild() != null) {
Node lastChild = node.getParent().getLastChild();
index2 = lastChild.getStartPosition() + lastChild.toHtml().length();
}
}
break;
}
}
}
} catch (ParserException e) {
e.printStackTrace();
}
if (index1 != -1 && index2 != -1) {
logger.debug("remove3-发件人/From:" + index1 + "," + index2);
return content.substring(0, index1) + content.substring(index2);
}
return content;
}