URL中的空格什么时候被编码为+,什么时候被编码为%20?


来自维基百科(强调和链接添加):

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.

因此,真正的百分比编码使用%20,而url中的表单数据是使用+的修改后的表单。所以你很可能只在url的查询字符串中看到+在?后面。


我推荐%20。

你在硬编码它们吗?

不过,这在不同语言之间并不一致。 如果我没有弄错的话,在PHP中urlencode()将空格视为+,而Python的urlencode()将空格视为%20。

编辑:

看来我弄错了。Python的urlencode()(至少在2.7.2中)使用quote_plus()而不是quote(),因此将空格编码为“+”。 W3C的推荐标准似乎也是“+”,如下所示:http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

事实上,您可以在Python自己的问题跟踪器上关注关于使用什么来编码空格的有趣辩论:http://bugs.python.org/issue13866。

编辑# 2:

我知道“”最常见的编码方式是“+”,但只是一个注释,可能只是我,但我发现这有点令人困惑:

import urllib
print(urllib.urlencode({' ' : '+ '})

>>> '+=%2B+'

造成这种混乱的原因是直到今天url仍然是“坏的”。

摘自一篇博客文章:

Take "http://www.google.com" for instance. This is a URL. A URL is a Uniform Resource Locator and is really a pointer to a web page (in most cases). URLs actually have a very well-defined structure since the first specification in 1994. We can extract detailed information about the "http://www.google.com" URL: +---------------+-------------------+ | Part | Data | +---------------+-------------------+ | Scheme | http | | Host | www.google.com | +---------------+-------------------+ If we look at a more complex URL such as: "https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third" we can extract the following information: +-------------------+---------------------+ | Part | Data | +-------------------+---------------------+ | Scheme | https | | User | bob | | Password | bobby | | Host | www.lunatech.com | | Port | 8080 | | Path | /file;p=1 | | Path parameter | p=1 | | Query | q=2 | | Fragment | third | +-------------------+---------------------+ https://bob:bobby@www.lunatech.com:8080/file;p=1?q=2#third \___/ \_/ \___/ \______________/ \__/\_______/ \_/ \___/ | | | | | | \_/ | | Scheme User Password Host Port Path | | Fragment \_____________________________/ | Query | Path parameter Authority The reserved characters are different for each part. For HTTP URLs, a space in a path fragment part has to be encoded to "%20" (not, absolutely not "+"), while the "+" character in the path fragment part can be left unencoded. Now in the query part, spaces may be encoded to either "+" (for backwards compatibility: do not try to search for it in the URI standard) or "%20" while the "+" character (as a result of this ambiguity) has to be escaped to "%2B". This means that the "blue+light blue" string has to be encoded differently in the path and query parts: "http://example.com/blue+light%20blue?blue%2Blight+blue". From there you can deduce that encoding a fully constructed URL is impossible without a syntactical awareness of the URL structure.

这可以归结为:

你应该有%20在?后面加+。


在“application/x-www-form-urlencoded”内容类型键值对查询部分中,空格只能被编码为“+”。在我看来,这是可以的,不是必须的。在其余的url中,它被编码为%20。

在我看来,最好总是将空格编码为%20,而不是“+”,即使是在URL的查询部分,因为HTML规范(RFC 1866)规定空格字符应该在“application/x-www-form-urlencoded”内容类型键值对中编码为“+”(见8.2.1段)。第1。)

这种编码表单数据的方式也在后面的HTML规范中给出。例如,在HTML 4.01规范中查找有关application/x-www-form-urlencoded的相关段落,等等。

下面是一个URL中的示例字符串,其中HTML规范允许编码空格为加号:“http://example.com/over/there?name=foo+bar”。所以,只有在"?"后面,空格才能被加号取代。在其他情况下,空格应该编码为%20。但是由于很难正确地确定上下文,所以最好不要将空格编码为“+”。

我建议对除RFC 3986, p.2.3中定义的“unreserved”之外的所有字符进行百分比编码

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

实现取决于您选择的编程语言。

如果您的URL包含国家字符,首先将它们编码为UTF-8,然后对结果进行百分比编码。


总结一下(有些矛盾的)答案,我认为可以归结为:

| standard      | +   | %20 |
|---------------+-----+-----|
| URL           | no  | yes |
| query string  | yes | yes |
| form params   | yes | no  |
| mailto query  | no  | yes |

所以我认为历史上发生的事情是:

RFC明确规定了url的形式和编码方式。在这种情况下,查询只是一个“字符串”,没有规范应该如何编码键/值对 HTTP的家伙们提出了一个标准,关于键/值对如何在表单参数中编码,并借鉴了URL编码标准,除了空格应该被编码为+。 web人员说,很酷,我们有了编码键/值对的方法我们把它放到URL查询字符串中

结果:我们最终得到了两种不同的方式来编码URL中的空格,这取决于你所谈论的部分。但它甚至没有违反URL标准。从URL的角度来看,“查询”只是一个黑盒。如果你想使用百分比编码之外的其他编码,请自便。

但是正如电子邮件示例所示,借用form-params实现用于URL查询字符串可能存在问题。所以最终使用%20更安全,但可能没有开箱即用的库支持。