在字符集之间转换文本文件的最佳方法?

在字符集之间转换文本文件的最快、最简单的工具或方法是什么?

具体来说，我需要从UTF-8转换为ISO-8859-15，反之亦然。

一切都可以:你最喜欢的脚本语言的一行程序，命令行工具或其他用于操作系统的实用程序，网站等等。

目前为止的最佳解决方案:

在 Linux/UNIX/OS X/cygwin 上：

Gnu iconv suggested by Troels Arvin is best used as a filter. It seems to be universally available. Example: $ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt As pointed out by Ben, there is an online converter using iconv. recode (manual) suggested by Cheekysoft will convert one or several files in-place. Example: $ recode UTF8..ISO-8859-15 in.txt This one uses shorter aliases: $ recode utf8..l9 in.txt Recode also supports surfaces which can be used to convert between different line ending types and encodings: Convert newlines from LF (Unix) to CR-LF (DOS): $ recode ../CR-LF in.txt Base64 encode file: $ recode ../Base64 in.txt You can also combine them. Convert a Base64 encoded UTF8 file with Unix line endings to Base64 encoded Latin 1 file with Dos line endings: $ recode utf8/Base64..l1/CR-LF/Base64 file.txt

在Windows Powershell (Jay Bazuzi)上:

PS C:\> gc - zh utf8 in.txt | out - zh ascii out.txt

(但是没有ISO-8859-15支持;它说支持的字符集是unicode, utf7, utf8, utf32, ascii, bigendianunicode, default和oem。)

Edit

你是指iso-8859-1支持吗?使用"String"可以做到这一点，反之亦然

gc -en string in.txt | Out-File -en utf8 out.txt

注意:可能的枚举值是“Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, Ascii”。

CsCvt - Kalytta的字符集转换器是另一个伟大的基于命令行的Windows转换工具。

当前回答

ruby:

ruby -e "File.write('output.txt', File.read('input.txt').encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ''))"

来源:https://robots.thoughtbot.com/fight-back-utf-8-invalid-byte-sequences

2018-06-26 06:25:16

其他回答

Visual Studio代码

Open your file in Visual Studio Code Reopen with Encoding: In the bottom status bar, to the right, you should see your current file encoding (eg "UTF-8"). Click this and select "Reopen with Encoding". Select the correct encoding of the file (eg: ISO 8859-2). Confirm that your content is displaying as expected. Save with Encoding: The bottom status bar should now display your new encoding format (eg: ISO 8859-2). Click this and choose "Save with Encoding" and select UTF-8 (or whatever new encoding you want).

注意:这将覆盖您的原始文件。先做备份。

2022-03-31 12:23:29

我最喜欢的工具是Jedit(一个基于java的文本编辑器)，它有两个非常方便的功能:

允许用户用不同的编码重新加载文本(因此，可以直观地控制结果) 另一个允许用户在保存之前显式地选择编码(和行字符的结束)

2018-09-17 11:08:00

写属性文件(Java)通常我在linux(薄荷和ubuntu发行版)使用这个:

$ native2ascii filename.properties

例如:

$ cat test.properties 
first=Execução número um
second=Execução número dois

$ native2ascii test.properties 
first=Execu\u00e7\u00e3o n\u00famero um
second=Execu\u00e7\u00e3o n\u00famero dois

PS:我用葡萄牙语写了第1 / 2个执行，以强制使用特殊字符。

以我为例，在第一次执行时，我收到了这样的消息:

$ native2ascii teste.txt 
The program 'native2ascii' can be found in the following packages:
 * gcj-5-jdk
 * openjdk-8-jdk-headless
 * gcj-4.8-jdk
 * gcj-4.9-jdk
Try: sudo apt install <selected package>

当我安装第一个选项(gcj-5-jdk)时，问题就解决了。

我希望这能帮助到一些人。

2016-11-28 19:32:40

iconv (1)

iconv -f FROM-ENCODING -t TO-ENCODING file.txt

此外，在许多语言中都有基于图标的工具。

2008-09-15 17:23:17

在Linux下，您可以使用非常强大的recode命令来尝试在不同的字符集以及任何行结束问题之间进行转换。Recode -l将显示该工具可以转换的所有格式和编码。这可能是一个很长的清单。

2008-09-15 17:24:18

在字符集之间转换文本文件的最佳方法?

推荐文章

最新文章

标签