‘长期读者,第一次海报’在这里.
我正在为我管理的西班牙语Wiki制作bot.我想从头开始,因为我的目的之一是练习Java.但是,在尝试使用HttpClient向包含非ASCII字符(如á,é,í,ó或ú)的URI进行GET请求时遇到了一些麻烦.
String url = "http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas"
method = new GetMethod(url);
client.executeMethod(method);
当我执行上述操作时,GetMethod会抱怨URI:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid uri 'http://es.pruebaloca.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras%20de%20las%20Botas&cmlimit=500&format=xml': Invalid query
at org.apache.commons.httpclient.HttpMethodBase.(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.(GetMethod.java:89)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:69)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:120)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
请注意,在堆栈跟踪中显示的URI中,空格被编码为,并且保留原样.完全相同的URI在浏览器上完美运行,但我无法接受GetMethod接受它.
我也尝试过以下方法:
URI uri = new URI(url, false);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
这样,URI转义了is,但双重转义空格(%20)…
http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categor%C3%ADa:Mejoras%2520de%2520las%2520Botas&cmlimit=500&format=xml
现在,如果我在查询中不使用任何空格,则没有双重转义,我得到所需的输出.因此,如果没有任何非ASCII字符的可能性,我将不需要使用URI类,也不会得到双重转义.为了避免第一次逃离空间,我尝试了这个:
URI uri = new URI(url, true);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
但URI类不喜欢它:
org.apache.commons.httpclient.URIException: Invalid query
at org.apache.commons.httpclient.URI.parseUriReference(URI.java:2049)
at org.apache.commons.httpclient.URI.(URI.java:167)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:66)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:121)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:39)
at net.metroidover.categorybot.bot.BotComponent.(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
任何关于如何避免这种双重逃逸的输入将不胜感激.我四处潜伏,绝对没有运气.
谢谢!
编辑:最适合我的解决方案是parsifal的一个,但是,作为补充,我想说使用method.setPath(url)设置路径使得HttpMethod拒绝我需要保存的cookie:
Aug 26, 2011 4:07:08 PM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders
WARNING: Cookie rejected: "wikicities_session=900beded4191ff880e09944c7c0aaf5a". Illegal path attribute "/". Path of origin: "http://es.metroid.wikia.com/api.php"
但是,如果我将URI发送到构造函数并忘记了setPath(url),那么cookie会毫无问题地保存.
String url = "http://es.metroid.wikia.com/api.php";
NameValuePair[] query = { new NameValuePair("action", "query"), new NameValuePair("list", "categorymembers"),
new NameValuePair("cmtitle", "Categoría:Mejoras de las Botas"), new NameValuePair("cmlimit", "500"),
new NameValuePair("format", "xml") };
HttpMethod method = null;
...
method = new GetMethod(url); // Or PostMethod(url)
method.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); // It had been like this the whole time
method.setQueryString(query);
client.executeMethod(method);
解决方法:
查看HttpMethodBase的文档,似乎所有String参数都必须进行预编码.最简单的解决方案是使用setPath()和带有一组name-value参数的setQueryString()变体分阶段构建您的URL.
标签:java,url,uri,httpclient,mediawiki-api
来源: https://codeday.me/bug/20190530/1186147.html