我的Java独立应用程序从用户那里获得一个URL(指向一个文件),我需要点击它并下载它。我面临的问题是,我不能正确编码HTTP URL地址…
例子:
URL: http://search.barnesandnoble.com/booksearch/first book.pdf
java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");
回报我。
http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf
但是,我想要的是
http://search.barnesandnoble.com/booksearch/first%20book.pdf
(空格替换为%20)
我猜URLEncoder不是为编码HTTP url设计的…JavaDoc说“HTML表单编码的实用程序类”…还有别的办法吗?
如果你的URL中有一个编码的“/”(%2F),这仍然是一个问题。
RFC 3986 -章节2.2说:“如果URI组件的数据与保留字符作为分隔符的目的相冲突,那么冲突的数据必须在URI形成之前进行百分比编码。”(rfc3986 -第2.2节)
但是Tomcat有一个问题:
http://tomcat.apache.org/security-6.html - Fixed in Apache Tomcat 6.0.10
important: Directory traversal CVE-2007-0450
Tomcat permits '\', '%2F' and '%5C'
[...] .
The following Java system properties
have been added to Tomcat to provide
additional control of the handling of
path delimiters in URLs (both options
default to false):
org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH:
true|false
org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH:
true|false
Due to the impossibility to guarantee
that all URLs are handled by Tomcat as
they are in proxy servers, Tomcat
should always be secured as if no
proxy restricting context access was
used.
Affects: 6.0.0-6.0.9
因此,如果您有一个含有%2F字符的URL, Tomcat将返回:"400 Invalid URI: noSlash"
你可以在Tomcat启动脚本中切换bug修复:
set JAVA_OPTS=%JAVA_OPTS% %LOGGING_CONFIG% -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
你可以使用这样的函数。根据您的需要完成并修改:
/**
* Encode URL (except :, /, ?, &, =, ... characters)
* @param url to encode
* @param encodingCharset url encoding charset
* @return encoded URL
* @throws UnsupportedEncodingException
*/
public static String encodeUrl (String url, String encodingCharset) throws UnsupportedEncodingException{
return new URLCodec().encode(url, encodingCharset).replace("%3A", ":").replace("%2F", "/").replace("%3F", "?").replace("%3D", "=").replace("%26", "&");
}
使用示例:
String urlToEncode = ""http://www.growup.com/folder/intérieur-à_vendre?o=4";
Utils.encodeUrl (urlToEncode , "UTF-8")
结果是:http://www.growup.com/folder/int%C3%A9rieur-%C3%A0_vendre?o=4
我阅读了以前的答案,写我自己的方法,因为我不能有一些正确的工作使用以前的答案的解决方案,它看起来对我很好,但如果你能找到不与此工作的URL,请让我知道。
public static URL convertToURLEscapingIllegalCharacters(String toEscape) throws MalformedURLException, URISyntaxException {
URL url = new URL(toEscape);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
//if a % is included in the toEscape string, it will be re-encoded to %25 and we don't want re-encoding, just encoding
return new URL(uri.toString().replace("%25", "%"));
}