如何编码的文件名称参数的内容处置头在HTTP?

想要强制下载资源而不是直接在Web浏览器中呈现资源的Web应用程序在表单的HTTP响应中发出Content-Disposition报头:

Content-Disposition:附件;filename = filename

filename参数可用于建议浏览器将资源下载到的文件的名称。然而，RFC 2183 (Content-Disposition)在2.3节(文件名参数)中规定文件名只能使用US-ASCII字符:

当前[RFC 2045]语法限制参数值(因此内容-处置文件名)到 us - ascii。我们认可伟大的允许任意的可取性文件名中的字符集，但它是超出了本文档的范围定义必要的机制。

然而，有经验证据表明，目前大多数流行的Web浏览器似乎允许非us - ascii字符，但(由于缺乏标准)在文件名的编码方案和字符集规范上存在分歧。问题是，如果文件名“naïvefile”(不带引号，第三个字母是U+00EF)需要编码到Content-Disposition报头中，那么流行的浏览器采用了哪些不同的方案和编码?

为了解决这个问题，流行的浏览器是:

谷歌Chrome Safari Internet Explorer或Edge 火狐歌剧

当前回答

我最终在“download.php”脚本中编写了以下代码(基于这篇博文和这些测试用例)。

$il1_filename = utf8_decode($filename);
$to_underscore = "\"\\#*;:|<>/?";
$safe_filename = strtr($il1_filename, $to_underscore, str_repeat("_", strlen($to_underscore)));

header("Content-Disposition: attachment; filename=\"$safe_filename\""
.( $safe_filename === $filename ? "" : "; filename*=UTF-8''".rawurlencode($filename) ));

只要只使用iso-latin1和“safe”字符，就使用标准的filename="…";如果不是，它会添加文件名*=UTF-8 " url编码的方式。根据这个具体的测试用例，它应该从MSIE9起，并在最近的FF, Chrome, Safari;在较低的MSIE版本中，它应该提供包含ISO8859-1版本的文件名，在非此编码的字符上使用下划线。

最后注意:最大值。在apache上，每个报头字段的大小为8190字节。UTF-8每个字符最多可以有四个字节;在rawurlencode之后，每个字符是x3 = 12字节。非常低效，但理论上仍然可以在文件名中有超过600个“smiles”%F0%9F%98%81。

2015-04-05 15:45:29

其他回答

库类Unicode中的方法mimeHeaderEncode($string)可以完成这项工作。

$file_name= Unicode::mimeHeaderEncode($file_name);

drupal/php中的例子:

https://github.com/drupal/core-utility/blob/8.8.x/Unicode.php

/**
   * Encodes MIME/HTTP headers that contain incorrectly encoded characters.
   *
   * For example, Unicode::mimeHeaderEncode('tést.txt') returns
   * "=?UTF-8?B?dMOpc3QudHh0?=".
   *
   * See http://www.rfc-editor.org/rfc/rfc2047.txt for more information.
   *
   * Notes:
   * - Only encode strings that contain non-ASCII characters.
   * - We progressively cut-off a chunk with self::truncateBytes(). This ensures
   *   each chunk starts and ends on a character boundary.
   * - Using \n as the chunk separator may cause problems on some systems and
   *   may have to be changed to \r\n or \r.
   *
   * @param string $string
   *   The header to encode.
   * @param bool $shorten
   *   If TRUE, only return the first chunk of a multi-chunk encoded string.
   *
   * @return string
   *   The mime-encoded header.
   */
  public static function mimeHeaderEncode($string, $shorten = FALSE) {
    if (preg_match('/[^\x20-\x7E]/', $string)) {
      // floor((75 - strlen("=?UTF-8?B??=")) * 0.75);
      $chunk_size = 47;
      $len = strlen($string);
      $output = '';
      while ($len > 0) {
        $chunk = static::truncateBytes($string, $chunk_size);
        $output .= ' =?UTF-8?B?' . base64_encode($chunk) . "?=\n";
        if ($shorten) {
          break;
        }
        $c = strlen($chunk);
        $string = substr($string, $c);
        $len -= $c;
      }
      return trim($output);
    }
    return $string;
  }

2021-12-21 10:49:51

在asp.net mvc2中，我使用这样的东西:

return File(
    tempFile
    , "application/octet-stream"
    , HttpUtility.UrlPathEncode(fileName)
    );

我想如果你不使用mvc(2)，你可以只编码文件名使用

HttpUtility.UrlPathEncode(fileName)

2010-07-15 15:08:29

在PHP中，这为我做了(假设文件名是UTF8编码):

header('Content-Disposition: attachment;'
    . 'filename="' . addslashes(utf8_decode($filename)) . '";'
    . 'filename*=utf-8\'\'' . rawurlencode($filename));

在IE8-11、Firefox和Chrome浏览器上进行测试。如果浏览器可以解释文件名*=utf-8，它将使用文件名的UTF8版本，否则它将使用解码后的文件名。如果你的文件名包含的字符不能在ISO-8859-1中表示，你可能要考虑使用iconv代替。

2016-05-20 12:47:05

我最终在“download.php”脚本中编写了以下代码(基于这篇博文和这些测试用例)。

$il1_filename = utf8_decode($filename);
$to_underscore = "\"\\#*;:|<>/?";
$safe_filename = strtr($il1_filename, $to_underscore, str_repeat("_", strlen($to_underscore)));

header("Content-Disposition: attachment; filename=\"$safe_filename\""
.( $safe_filename === $filename ? "" : "; filename*=UTF-8''".rawurlencode($filename) ));

2015-04-05 15:45:29

经典ASP解决方案

大多数现代浏览器现在都支持将文件名作为UTF-8传递，但我使用的文件上传解决方案是基于FreeASPUpload的。Net(站点已经不存在了，链接指向archive.org)，它不会工作，因为二进制解析依赖于读取单字节ASCII编码的字符串，当您传递UTF-8编码的数据时，它工作得很好，直到您得到ASCII不支持的字符。

然而，我能够找到一个解决方案，使代码读取和解析二进制为UTF-8。

Public Function BytesToString(bytes)    'UTF-8..
  Dim bslen
  Dim i, k , N 
  Dim b , count 
  Dim str

  bslen = LenB(bytes)
  str=""

  i = 0
  Do While i < bslen
    b = AscB(MidB(bytes,i+1,1))

    If (b And &HFC) = &HFC Then
      count = 6
      N = b And &H1
    ElseIf (b And &HF8) = &HF8 Then
      count = 5
      N = b And &H3
    ElseIf (b And &HF0) = &HF0 Then
      count = 4
      N = b And &H7
    ElseIf (b And &HE0) = &HE0 Then
      count = 3
      N = b And &HF
    ElseIf (b And &HC0) = &HC0 Then
      count = 2
      N = b And &H1F
    Else
      count = 1
      str = str & Chr(b)
    End If

    If i + count - 1 > bslen Then
      str = str&"?"
      Exit Do
    End If

    If count>1 then
      For k = 1 To count - 1
        b = AscB(MidB(bytes,i+k+1,1))
        N = N * &H40 + (b And &H3F)
      Next
      str = str & ChrW(N)
    End If
    i = i + count
  Loop

  BytesToString = str
End Function

通过在我自己的代码中实现include_aspuploader.asp中的by睾string()函数，我能够获得UTF-8文件名。

有用的链接

一个ASP经典应用程序中的Multipart/form-data和UTF-8 Unicode, UTF, ASCII, ANSI格式的差异

2016-05-23 12:17:58

如何编码的文件名称参数的内容处置头在HTTP?

推荐文章

最新文章

标签