I asked a question earlier but met harsh criticism, so here I pose it again. Simpler, and rephrased to appeal to those who may have been concerned about the way I asked it before.
BACKGROUND
I am parsing some HTML for information. I have isolated everything in a series of lines but the content I wish to grab and a bunch of spaces after it. To get rid of the spaces, I opted to use trim(), but I have been having trouble. The last few lines of my code are tests:
System.out.println("'" + someString + "'\n'" + someString.trim() + "'");
The results were:
'Sophomore '
'Sophomore '
I was worried I might have a problem with the way I was calling trim(), since we all make mistakes from time to time, so I tested it like this:
String s = " hello ";
System.out.println("'" + s+ "'\n'" + s.trim() + "'");
The results were:
' hello '
'hello'
MY QUESTION
What am I doing wrong? What I want is to get 'Sophomore', not 'Sophomore '
I look forward to your excellent answers (thanks in advance!).
解决方案
String.trim() specifically only removes characters before the first character whose code exceeds \u0020, and after the last such character.
This is insufficient to remove all possible white space characters - Unicode defines several more (with code points above \u0020) that will not be matched by .trim().
Perhaps your white space characters aren't the ones you think they are?
EDIT comments revealed that the extra characters were indeed "special" whitespace characters, specifically \u00a0 which is a Unicode "non-breaking space". To replace those with normal spaces, use:
str = str.replace('\u00a0', ' ');