I have a string of XML data. I need to escape the values within the nodes, but not the nodes themselves.
Ex:
R&R
should escape to:
R&R
should not escape to:
<node1>R&R</node1>
I have been working on this for the last couple of days, but haven't had much success. I'm not an expert with Java, but the following are things that I have tried that will not work:
Parsing string xml into a document. Does not work since the data within the nodes contains invalid xml data.
Escaping all of the characters. Does not work since the program receiving this data will not accept it in this format.
Escaping all characters then parsing into document. Throws all sorts of errors.
Any help would be much appreciated.
解决方案
You could use regular expression matching to find all the strings between angled brackets, and loop through/process each of those. In this example I've used the Apache Commons Lang to do the XML escaping.
public String sanitiseXml(String xml)
{
// Match the pattern text
Pattern xmlCleanerPattern = Pattern.compile("(]*>)([^<>]*)([^<>]*>)");
StringBuilder xmlStringBuilder = new StringBuilder();
Matcher matcher = xmlCleanerPattern.matcher(xml);
int lastEnd = 0;
while (matcher.find())
{
// Include any non-matching text between this result and the previous result
if (matcher.start() > lastEnd) {
xmlStringBuilder.append(xml.substring(lastEnd, matcher.start()));
}
lastEnd = matcher.end();
// Sanitise the characters inside the tags and append the sanitised version
String cleanText = StringEscapeUtils.escapeXml10(matcher.group(2));
xmlStringBuilder.append(matcher.group(1)).append(cleanText).append(matcher.group(3));
}
// Include any leftover text after the last result
xmlStringBuilder.append(xml.substring(lastEnd));
return xmlStringBuilder.toString();
}
This looks for matches of text, captures the tag names and contained text, sanitises the contained text, and then puts it back together.