Out-File似乎在使用UTF-8时强制BOM:

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "UTF8" $MyPath

我怎么能写一个文件在UTF-8没有BOM使用PowerShell?

更新2021

自从10年前我写这个问题以来,PowerShell已经发生了一些变化。检查下面的多个答案,它们有很多有用的信息!


使用.NET的UTF8Encoding类并将$False传递给构造函数似乎是可行的:

$MyRawString = Get-Content -Raw $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyRawString, $Utf8NoBomEncoding)

这个脚本将把DIRECTORY1中的所有.txt文件转换为不含BOM的UTF-8格式,并将它们输出到DIRECTORY2

foreach ($i in ls -name DIRECTORY1\*.txt)
{
    $file_content = Get-Content "DIRECTORY1\$i";
    [System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content);
}

这是为我工作(使用“默认”而不是“UTF8”):

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding "Default" $MyPath

结果是没有BOM的ASCII。


可以使用下面得到UTF8没有BOM

$MyFile | Out-File -Encoding ASCII

目前正确的方法是使用@Roman Kuzmin在给@M的评论中推荐的解决方案。达德利回答:

[IO.File]::WriteAllLines($filename, $content)

(我还通过去掉不必要的系统名称空间说明来缩短了它——默认情况下它将自动被替换。)


注意:这个答案适用于Windows PowerShell;相比之下,在跨平台的PowerShell核心版(v6+)中,在所有cmdlet中,不含BOM的UTF-8是默认编码。

换句话说:如果您使用PowerShell [Core]版本6或更高版本,默认情况下您将获得无bom的UTF-8文件(您也可以显式地使用-Encoding utf8 / -Encoding utf8NoBOM请求,而使用-utf8BOM获得- bom编码)。 如果你正在运行Windows 10,并且你愿意在系统范围内切换到无bom的UTF-8编码——这可能会有副作用——甚至Windows PowerShell也可以一直使用无bom的UTF-8编码——请看这个答案。


为了补充M. Dudley自己简单而务实的回答(以及ForNeVeR更简洁的重新表述):

A simple, (non-streaming) PowerShell-native alternative is to use New-Item, which (curiously) creates BOM-less UTF-8 files by default even in Windows PowerShell: # Note the use of -Raw to read the file as a whole. # Unlike with Set-Content / Out-File *no* trailing newline is appended. $null = New-Item -Force $MyPath -Value (Get-Content -Raw $MyPath) Note: To save the output from arbitrary commands in the same format as Out-File would, pipe to Out-String first; e.g.: $null = New-Item -Force Out.txt -Value (Get-ChildItem | Out-String) For convenience, below is advanced function Out-FileUtf8NoBom, a pipeline-based alternative that mimics Out-File, which means: you can use it just like Out-File in a pipeline. input objects that aren't strings are formatted as they would be if you sent them to the console, just like with Out-File. an additional -UseLF switch allows you use Unix-format LF-only newlines ("`n") instead of the Windows-format CRLF newlines ("`r`n") you normally get.

例子:

(Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath # Add -UseLF for Unix newlines

注意(Get-Content $MyPath)是如何包含在(…)中,这确保在通过管道发送结果之前打开、完整读取和关闭整个文件。为了能够写回相同的文件(在适当的位置更新它),这是必要的。 一般来说,这种技术是不可取的,原因有二:(a)整个文件必须适合内存;(b)如果命令被中断,数据将丢失。

内存使用注意事项:

达德利先生自己的回答 和上面的New-Item替代方案要求首先在内存中构建整个文件内容,这对于大的输入集可能是个问题。 下面的函数不需要这样做,因为它是作为一个代理(包装器)函数实现的(关于如何定义这样的函数的简明摘要,请参阅这个答案)。


Out-FileUtf8NoBom函数源代码:

注意:该功能也可以作为麻省理工学院授权的Gist使用,并且今后只会维护它。

你可以直接使用以下命令安装它(虽然我个人可以向你保证这样做是安全的,但在直接执行脚本之前,你应该总是检查脚本的内容):

# Download and define the function.
irm https://gist.github.com/mklement0/8689b9b5123a9ba11df7214f82a673be/raw/Out-FileUtf8NoBom.ps1 | iex
function Out-FileUtf8NoBom {

  <#
  .SYNOPSIS
    Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).

  .DESCRIPTION

    Mimics the most important aspects of Out-File:
      * Input objects are sent to Out-String first.
      * -Append allows you to append to an existing file, -NoClobber prevents
        overwriting of an existing file.
      * -Width allows you to specify the line width for the text representations
        of input objects that aren't strings.
    However, it is not a complete implementation of all Out-File parameters:
      * Only a literal output path is supported, and only as a parameter.
      * -Force is not supported.
      * Conversely, an extra -UseLF switch is supported for using LF-only newlines.

  .NOTES
    The raison d'être for this advanced function is that Windows PowerShell
    lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8 
    invariably prepends a BOM.

    Copyright (c) 2017, 2022 Michael Klement <mklement0@gmail.com> (http://same2u.net), 
    released under the [MIT license](https://spdx.org/licenses/MIT#licenseText).

  #>

  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, Position = 0)] [string] $LiteralPath,
    [switch] $Append,
    [switch] $NoClobber,
    [AllowNull()] [int] $Width,
    [switch] $UseLF,
    [Parameter(ValueFromPipeline)] $InputObject
  )

  begin {

    # Convert the input path to a full one, since .NET's working dir. usually
    # differs from PowerShell's.
    $dir = Split-Path -LiteralPath $LiteralPath
    if ($dir) { $dir = Convert-Path -ErrorAction Stop -LiteralPath $dir } else { $dir = $pwd.ProviderPath }
    $LiteralPath = [IO.Path]::Combine($dir, [IO.Path]::GetFileName($LiteralPath))
    
    # If -NoClobber was specified, throw an exception if the target file already
    # exists.
    if ($NoClobber -and (Test-Path $LiteralPath)) {
      Throw [IO.IOException] "The file '$LiteralPath' already exists."
    }
    
    # Create a StreamWriter object.
    # Note that we take advantage of the fact that the StreamWriter class by default:
    # - uses UTF-8 encoding
    # - without a BOM.
    $sw = New-Object System.IO.StreamWriter $LiteralPath, $Append
    
    $htOutStringArgs = @{}
    if ($Width) { $htOutStringArgs += @{ Width = $Width } }

    try { 
      # Create the script block with the command to use in the steppable pipeline.
      $scriptCmd = { 
        & Microsoft.PowerShell.Utility\Out-String -Stream @htOutStringArgs | 
          . { process { if ($UseLF) { $sw.Write(($_ + "`n")) } else { $sw.WriteLine($_) } } }
      }  
      
      $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
      $steppablePipeline.Begin($PSCmdlet)
    }
    catch { throw }

  }

  process
  {
    $steppablePipeline.Process($_)
  }

  end {
    $steppablePipeline.End()
    $sw.Dispose()
  }

}

我使用的一种技术是使用Out-File cmdlet将输出重定向到ASCII文件。

例如,我经常运行创建另一个SQL脚本并在Oracle中执行的SQL脚本。使用简单的重定向(“>”),输出将是SQLPlus无法识别的UTF-16格式。要解决这个问题:

sqlplus -s / as sysdba "@create_sql_script.sql" |
Out-File -FilePath new_script.sql -Encoding ASCII -Force

生成的脚本可以通过另一个SQLPlus会话执行,而无需担心Unicode:

sqlplus / as sysdba "@new_script.sql" |
tee new_script.log

更新:正如其他人指出的那样,这会删除非ascii字符。由于用户要求一种“强制”转换的方法,我假设他们并不关心这一点,因为他们的数据可能不包含这样的数据。

如果您关心非ascii字符的保存,这不是适合您的答案。


更改多个文件扩展到UTF-8没有BOM:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach($i in ls -recurse -filter "*.java") {
    $MyFile = Get-Content $i.fullname 
    [System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding)
}

我认为这不会是UTF,但我只是发现了一个相当简单的解决方案,似乎工作…

Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext

对我来说,这导致了一个没有bom文件的utf-8,不管源格式如何。


    [System.IO.FileInfo] $file = Get-Item -Path $FilePath 
    $sequenceBOM = New-Object System.Byte[] 3 
    $reader = $file.OpenRead() 
    $bytesRead = $reader.Read($sequenceBOM, 0, 3) 
    $reader.Dispose() 
    #A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191 
    if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191) 
    { 
        $utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) 
        [System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding) 
        Write-Host "Remove UTF-8 BOM successfully" 
    } 
    Else 
    { 
        Write-Warning "Not UTF-8 BOM file" 
    }  

如何使用PowerShell从文件中删除UTF8字节顺序标记(BOM)


如果你想使用[System.IO.File]::WriteAllLines(),你应该将第二个参数转换为String[](如果$MyFile的类型是Object[]),并指定绝对路径$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath),如:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile
[System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding)

如果你想使用[System.IO.File]::WriteAllText(),有时你应该将第二个参数管道到| Out-String |中,以显式地将crlf添加到每行的末尾(特别是当你使用ConvertTo-Csv时):

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp
[System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding)

或者你可以使用[Text.Encoding]::UTF8.GetBytes()与Set-Content -Encoding Byte:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "/absolute/path/to/foobar.csv"

参见:如何将ConvertTo-Csv的结果写入没有BOM的UTF-8文件


当使用Set-Content而不是Out-File时,可以指定encoding Byte,它可用于将字节数组写入文件。这与不发出BOM的自定义UTF8编码相结合,给出了所需的结果:

# This variable can be reused
$utf8 = New-Object System.Text.UTF8Encoding $false

$MyFile = Get-Content $MyPath -Raw
Set-Content -Value $utf8.GetBytes($MyFile) -Encoding Byte -Path $MyPath

与使用[IO.File]::WriteAllLines()或类似方法的区别在于,它应该适用于任何类型的项和路径,而不仅仅是实际的文件路径。


从版本6开始,powershell支持UTF8NoBOM编码用于设置内容和输出文件,甚至将其用作默认编码。

所以在上面的例子中,它应该是这样的:

$MyFile | Out-File -Encoding UTF8NoBOM $MyPath

对于PowerShell 5.1,启用此设置:

控制面板,区域,管理,更改系统区域,使用Unicode UTF-8 全球语言支持

然后输入PowerShell:

$PSDefaultParameterValues['*:Encoding'] = 'Default'

或者,您可以升级到PowerShell 6或更高版本。

https://github.com/PowerShell/PowerShell


重要!:这只适用于当一个额外的空格或换行符在开始是没有问题的文件用例 (例如,如果是SQL文件、Java文件或人类可读的文本文件)

可以结合使用创建一个空(非utf8或ASCII (utf8兼容))文件并追加它(如果源文件是一个文件,则将$str替换为gc $src):

" "    |  out-file  -encoding ASCII  -noNewline  $dest
$str  |  out-file  -encoding UTF8   -append     $dest

当一行程序

根据你的用例替换$dest和$str:

$_ofdst = $dest ; " " | out-file -encoding ASCII -noNewline $_ofdst ; $src | out-file -encoding UTF8 -append $_ofdst

作为简单函数

function Out-File-UTF8-noBOM { param( $str, $dest )
  " "    |  out-file  -encoding ASCII  -noNewline  $dest
  $str  |  out-file  -encoding UTF8   -append     $dest
}

与源文件一起使用:

Out-File-UTF8-noBOM  (gc $src),  $dest

与字符串一起使用:

Out-File-UTF8-noBOM  $str,  $dest

可选:继续追加Out-File: "more foo bar" | Out-File -encoding UTF8 -append $dest


老问题,新答案:

虽然“旧的”powershell写一个BOM,但新的平台不可知的变体确实表现不同:默认是“无BOM”,它可以通过switch配置:

-Encoding Specifies the type of encoding for the target file. The default value is utf8NoBOM. The acceptable values for this parameter are as follows: ascii: Uses the encoding for the ASCII (7-bit) character set. bigendianunicode: Encodes in UTF-16 format using the big-endian byte order. oem: Uses the default encoding for MS-DOS and console programs. unicode: Encodes in UTF-16 format using the little-endian byte order. utf7: Encodes in UTF-7 format. utf8: Encodes in UTF-8 format. utf8BOM: Encodes in UTF-8 format with Byte Order Mark (BOM) utf8NoBOM: Encodes in UTF-8 format without Byte Order Mark (BOM) utf32: Encodes in UTF-32 format.

来源:https://learn.microsoft.com/de-de/powershell/module/Microsoft.PowerShell.Utility/Out-File?view=powershell-7 我特别强调


我在PowerShell中有相同的错误,并使用此隔离并修复了它

$PSDefaultParameterValues['*:Encoding'] = 'utf8'

使用该方法编辑UTF8-NoBOM文件,生成编码正确的文件-

$fileD = "file.xml"
(Get-Content $fileD) | ForEach-Object { $_ -replace 'replace text',"new text" } | out-file "file.xml" -encoding ASCII

起初我对这种方法持怀疑态度,但它让我感到惊讶,而且很有效!

使用powershell 5.1版进行测试


我建议只使用Set-Content命令,不需要其他任何命令。

我系统中的powershell版本是:-

PS C:\Users\XXXXX> $PSVersionTable.PSVersion | fl


Major         : 5
Minor         : 1
Build         : 19041
Revision      : 1682
MajorRevision : 0
MinorRevision : 1682

PS C:\Users\XXXXX>

所以你需要跟随。

PS C:\Users\XXXXX> Get-Content .\Downloads\finddate.txt
Thursday, June 23, 2022 5:57:59 PM
PS C:\Users\XXXXX> Get-Content .\Downloads\finddate.txt | Set-Content .\Downloads\anotherfile.txt
PS C:\Users\XXXXX> Get-Content .\Downloads\anotherfile.txt
Thursday, June 23, 2022 5:57:59 PM
PS C:\Users\XXXXX>

现在,当我们检查文件,根据截图,它是utf8。 anotherfile.txt