laz..
5
看起来Jsoup看起来无法处理带有混合内容的元素的文本.这是一个使用您使用XOM和TagSoup制定的XPath的解决方案:
import java.io.IOException;
import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.Nodes;
import nu.xom.ParsingException;
import nu.xom.ValidityException;
import nu.xom.XPathContext;
import org.ccil.cowan.tagsoup.Parser;
import org.xml.sax.SAXException;
public class HtmlTest {
public static void main(final String[] args) throws SAXException, ValidityException, ParsingException, IOException {
final String html = "
final Parser parser = new Parser();
final Builder builder = new Builder(parser);
final Document document = builder.build(html, null);
final nu.xom.Element root = document.getRootElement();
final Nodes textElements = root.query("//xhtml:div[@class='info']/xhtml:strong[1]/following::text()", new XPathContext("xhtml", root.getNamespaceURI()));
for (int textNumber = 0; textNumber < textElements.size(); ++textNumber) {
System.out.println(textElements.get(textNumber).toXML());
}
}
}
这输出:
some text 1
some text 2
Line 3:
some text 3
虽然不知道你要做什么的更多具体细节,但我不确定这是否正是你想要的.