I'm trying to resolve a relative link that starts with a question mark ? using Java's URL or URI classes.
HTML example:
Code examples (from Scala REPL):
import java.net._
scala> new URL(new URL("http://abc.com.br/index.php?hello=world"), "?test=xyz").toExternalForm()
res30: String = http://abc.com.br/?test=xyz
scala> (new URI("http://abc.com.br/index.php?hello=world")).resolve("?test=xyz").toString
res31: java.net.URI = http://abc.com.br/?test=xyz
The problem is that browsers (tested on Chrome, Firefox and Safari) output the following URL instead: http://abc.com.br/index.php?hello=world. It doesn't discard the path "index.php". It just replaces the query string part.
And it seems that browsers are just following the especification as explained in https://stackoverflow.com/a/7872230/40876.
Jsoup library makes the same "mistake" when we use element.absUrl("href") as it also depends on java's URL resolving.
So what's up with java's URL/URI resolving relative paths? Is it wrong/incomplete?
How to make it behave the same as the browsers implementation?
解决方案
This will work just fine:
public static void main(String[] args) throws Exception {
String base = "http://abc.com.br/index.php?hello=world";
String relative = "?test=xyz";
System.out.println(new URL(new URL(base), relative).toExternalForm());
// http://abc.com.br/?test=xyz
System.out.println((new URI(base)).resolve(relative).toString());
// http://abc.com.br/?test=xyz
System.out.println(org.apache.http.client.utils.URIUtils.resolve(new URI(base), relative).toString());
// http://abc.com.br/index.php?test=xyz
}
URIUtils live in org.apache.httpcomponents:httpclient version 4.0 or higher.