UTF-8和UTF-8与BOM有什么不同?哪个更好?


当前回答

Unicode字节顺序标记(BOM)常见问题解答提供了一个简明的答案:

Q: How I should deal with BOMs? A: Here are some guidelines to follow: A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM. Some protocols allow optional BOMs in the case of untagged text. In those cases, Where a text data stream is known to be plain text, but of unknown encoding, BOM can be used as a signature. If there is no BOM, the encoding could be anything. Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. If there is no BOM, the text should be interpreted as big-endian. Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used with these protocols, use of the BOM as encoding form signature should be avoided. Where the precise type of the data stream is known (e.g. Unicode big-endian or Unicode little-endian), the BOM should not be used. In particular, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used.

其他回答

如果你在HTML文件中使用UTF-8,如果你在同一页面上使用塞尔维亚西里尔语、塞尔维亚拉丁语、德语、匈牙利语或一些外来语言,那么使用UTF和BOM更好。

这是我(从事计算机和IT行业30年)的观点。

没有BOM的UTF-8没有BOM,这并不意味着它比有BOM的UTF-8更好,除非文件的消费者需要知道(或者从知道中受益)文件是否是UTF-8编码的。

BOM通常用于确定编码的字节序,这对于大多数用例来说是不需要的。

此外,对于那些不了解或不关心BOM的消费者来说,BOM可能是不必要的噪音/痛苦,并可能导致用户困惑。

我用utf-8保存了一个自动热键文件,中文字符变得奇怪。

使用utf-8 BOM,工作正常。

AutoHotkey不会自动识别UTF-8文件,除非它以字节顺序标记开始。

https://www.autohotkey.com/docs/FAQ.htm#nonascii

Unicode字节顺序标记(BOM)常见问题解答提供了一个简明的答案:

Q: How I should deal with BOMs? A: Here are some guidelines to follow: A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM. Some protocols allow optional BOMs in the case of untagged text. In those cases, Where a text data stream is known to be plain text, but of unknown encoding, BOM can be used as a signature. If there is no BOM, the encoding could be anything. Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. If there is no BOM, the text should be interpreted as big-endian. Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used with these protocols, use of the BOM as encoding form signature should be avoided. Where the precise type of the data stream is known (e.g. Unicode big-endian or Unicode little-endian), the BOM should not be used. In particular, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used.

从http://en.wikipedia.org/wiki/Byte-order_mark:

字节顺序标记(BOM)是一个Unicode 符号的符号 文本文件的字节顺序 或流。其编码点为U+FEFF。 BOM使用是可选的,如果使用, 应该出现在文本的开头吗 流。除了它的特殊用途 字节顺序指示器,即BOM 字符也可以指示哪一个 几种Unicode表示 文本是用。

总是在文件中使用BOM将确保它总是在支持UTF-8和BOM的编辑器中正确打开。

我对缺少BOM的真正问题如下。假设我们有一个文件,它包含:

abc

如果没有BOM,在大多数编辑器中它会作为ANSI打开。所以这个文件的另一个用户打开它,并添加一些本机字符,例如:

abg-αβγ

哎呀……现在文件仍然在ANSI中,你猜怎么着,“αβγ”不占用6个字节,而是3个字节。这不是UTF-8,这会在开发链的后面引起其他问题。