我的Java独立应用程序从用户那里获得一个URL(指向一个文件),我需要点击它并下载它。我面临的问题是,我不能正确编码HTTP URL地址…

例子:

URL:  http://search.barnesandnoble.com/booksearch/first book.pdf

java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");

回报我。

http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf

但是,我想要的是

http://search.barnesandnoble.com/booksearch/first%20book.pdf

(空格替换为%20)

我猜URLEncoder不是为编码HTTP url设计的…JavaDoc说“HTML表单编码的实用程序类”…还有别的办法吗?


当前回答

如果任何人不想向他们的项目添加依赖项,这些函数可能会有帮助。

我们将URL的path部分传递到这里。您可能不想将完整的URL作为参数传递进来(查询字符串需要不同的转义,等等)。

/**
 * Percent-encodes a string so it's suitable for use in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentEncode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String encoded = encodeMe.replace("%", "%25");
    encoded = encoded.replace(" ", "%20");
    encoded = encoded.replace("!", "%21");
    encoded = encoded.replace("#", "%23");
    encoded = encoded.replace("$", "%24");
    encoded = encoded.replace("&", "%26");
    encoded = encoded.replace("'", "%27");
    encoded = encoded.replace("(", "%28");
    encoded = encoded.replace(")", "%29");
    encoded = encoded.replace("*", "%2A");
    encoded = encoded.replace("+", "%2B");
    encoded = encoded.replace(",", "%2C");
    encoded = encoded.replace("/", "%2F");
    encoded = encoded.replace(":", "%3A");
    encoded = encoded.replace(";", "%3B");
    encoded = encoded.replace("=", "%3D");
    encoded = encoded.replace("?", "%3F");
    encoded = encoded.replace("@", "%40");
    encoded = encoded.replace("[", "%5B");
    encoded = encoded.replace("]", "%5D");
    return encoded;
}

/**
 * Percent-decodes a string, such as used in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentDecode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String decoded = encodeMe.replace("%21", "!");
    decoded = decoded.replace("%20", " ");
    decoded = decoded.replace("%23", "#");
    decoded = decoded.replace("%24", "$");
    decoded = decoded.replace("%26", "&");
    decoded = decoded.replace("%27", "'");
    decoded = decoded.replace("%28", "(");
    decoded = decoded.replace("%29", ")");
    decoded = decoded.replace("%2A", "*");
    decoded = decoded.replace("%2B", "+");
    decoded = decoded.replace("%2C", ",");
    decoded = decoded.replace("%2F", "/");
    decoded = decoded.replace("%3A", ":");
    decoded = decoded.replace("%3B", ";");
    decoded = decoded.replace("%3D", "=");
    decoded = decoded.replace("%3F", "?");
    decoded = decoded.replace("%40", "@");
    decoded = decoded.replace("%5B", "[");
    decoded = decoded.replace("%5D", "]");
    decoded = decoded.replace("%25", "%");
    return decoded;
}

和测试:

@Test
public void testPercentEncode_Decode() {
    assertEquals("", percentDecode(percentEncode(null)));
    assertEquals("", percentDecode(percentEncode("")));

    assertEquals("!", percentDecode(percentEncode("!")));
    assertEquals("#", percentDecode(percentEncode("#")));
    assertEquals("$", percentDecode(percentEncode("$")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("&", percentDecode(percentEncode("&")));
    assertEquals("'", percentDecode(percentEncode("'")));
    assertEquals("(", percentDecode(percentEncode("(")));
    assertEquals(")", percentDecode(percentEncode(")")));
    assertEquals("*", percentDecode(percentEncode("*")));
    assertEquals("+", percentDecode(percentEncode("+")));
    assertEquals(",", percentDecode(percentEncode(",")));
    assertEquals("/", percentDecode(percentEncode("/")));
    assertEquals(":", percentDecode(percentEncode(":")));
    assertEquals(";", percentDecode(percentEncode(";")));

    assertEquals("=", percentDecode(percentEncode("=")));
    assertEquals("?", percentDecode(percentEncode("?")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("[", percentDecode(percentEncode("[")));
    assertEquals("]", percentDecode(percentEncode("]")));
    assertEquals(" ", percentDecode(percentEncode(" ")));

    // Get a little complex
    assertEquals("[]]", percentDecode(percentEncode("[]]")));
    assertEquals("a=d%*", percentDecode(percentEncode("a=d%*")));
    assertEquals(")  (", percentDecode(percentEncode(")  (")));
    assertEquals("%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25",
                    percentEncode("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %"));
    assertEquals("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %", percentDecode(
                    "%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25"));

    assertEquals("%23456", percentDecode(percentEncode("%23456")));

}

其他回答

URL编码会对那个字符串进行编码这样它就能在URL中正确地传递到最终目的地。例如,您不能使用http://stackoverflow.com?url=http://yyy.com。UrlEncoding参数将修复该参数值。

所以我给你两个选择:

您是否有权访问与域分离的路径?如果是这样,您可以简单地对路径进行UrlEncode。然而,如果情况并非如此,那么选择2可能适合你。 commons - httpclient 3.1。它有一个类URIUtil: System.out.println(URIUtil.encodePath("http://example.com/x y", "ISO-8859-1"));

这将输出您正在寻找的内容,因为它只对URI的路径部分进行编码。

供您参考,这个方法需要common -codec和common -logging才能在运行时工作。

如果你用的是弹簧,你可以试试 org.springframework.web.util.UriUtils # encodePath

在此我将针对Android用户添加一条建议。您可以这样做,从而避免获得任何外部库。此外,上面一些答案中建议的所有搜索/替换字符解决方案都是危险的,应该避免。

试一试:

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

您可以看到,在这个特定的URL中,我需要对这些空格进行编码,以便我可以将其用于请求。

这利用了Android类中提供给你的几个功能。首先,URL类可以将URL分解为适当的组件,因此不需要进行任何字符串搜索/替换工作。其次,当您通过组件而不是从单个字符串构造URI时,这种方法利用了正确转义组件的URI类特性。

这种方法的美妙之处在于,您可以使用任何有效的url字符串并让它工作,而不需要您自己对它有任何特殊的了解。

我开发了一个用于此目的的库:galimatias。它解析URL的方式与web浏览器相同。也就是说,如果一个URL在浏览器中工作,它将被galimatias正确解析。

在这种情况下:

// Parse
io.mola.galimatias.URL.parse(
    "http://search.barnesandnoble.com/booksearch/first book.pdf"
).toString()

会给你:http://search.barnesandnoble.com/booksearch/first%20book.pdf。当然,这是最简单的情况,但它可以用于任何东西,远远超出java.net.URI。

你可以在https://github.com/smola/galimatias上查看

如果你的URL中有一个编码的“/”(%2F),这仍然是一个问题。

RFC 3986 -章节2.2说:“如果URI组件的数据与保留字符作为分隔符的目的相冲突,那么冲突的数据必须在URI形成之前进行百分比编码。”(rfc3986 -第2.2节)

但是Tomcat有一个问题:

http://tomcat.apache.org/security-6.html - Fixed in Apache Tomcat 6.0.10 important: Directory traversal CVE-2007-0450 Tomcat permits '\', '%2F' and '%5C' [...] . The following Java system properties have been added to Tomcat to provide additional control of the handling of path delimiters in URLs (both options default to false): org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH: true|false org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH: true|false Due to the impossibility to guarantee that all URLs are handled by Tomcat as they are in proxy servers, Tomcat should always be secured as if no proxy restricting context access was used. Affects: 6.0.0-6.0.9

因此,如果您有一个含有%2F字符的URL, Tomcat将返回:"400 Invalid URI: noSlash"

你可以在Tomcat启动脚本中切换bug修复:

set JAVA_OPTS=%JAVA_OPTS% %LOGGING_CONFIG%   -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true