UTF-8和Unicode有什么区别?

根据维基百科UTF-8页面，我从人们那里听到了相互矛盾的观点。

它们是一样的，不是吗?有人能澄清一下吗?

当前回答

你通常从谷歌开始，然后想尝试不同的东西。但是如何打印和转换所有这些字符集呢?

这里我列出了一些有用的一行程序。

Powershell:

# Print character with the Unicode point (U+<hexcode>) using this: 
[char]0x2550

# With Python installed, you can print the unicode character from U+xxxx with:
python -c 'print(u"\u2585")'

如果你有更多的Powershell trix或快捷方式，请评论。

在Bash中，你会喜欢libiconv和util-linux包中的iconv、hexdump和xxd(可能在其他*nix发行版中命名不同)。

# To print the 3-byte hex code for a Unicode character:
printf "\\\x%s" $(printf '═'|xxd -p -c1 -u)
#\xE2\x95\x90

# To print the Unicode character represented by hex string:
printf '\xE2\x96\x85'
#▅

# To convert from UTF-16LE to Unicode
echo -en "════"| iconv -f UTF-16LE -t UNICODEFFFE

# To convert a string into hex: 
echo -en '═�'| xxd -g 1
#00000000: e2 95 90 ef bf bd

# To convert a string into binary:
echo -en '═�\n'| xxd -b
#00000000: 11100010 10010101 10010000 11101111 10111111 10111101  ......
#00000006: 00001010

# To convert a binary string into hex:
printf  '%x\n' "$((2#111000111000000110000010))"
#e38182

2022-01-04 14:50:54

其他回答

Unicode是与ISO/IEC 10646一起定义通用字符集(UCS)的标准，UCS是表示几乎所有已知语言所需的所有现有字符的超集。

Unicode为其存储库中的每个字符分配一个名称和一个数字(字符代码或代码点)。

UTF-8编码，是一种在计算机内存中以数字方式表示这些字符的方法。UTF-8将每个码位映射到一个八字节序列(8位字节)

,例如,

UCS字符= Unicode字符

UCS代码点= U+24B62

UTF-8 encoding = F0 A4 AD A2 (hex) = 11110000 10100100 10101101 10100010 (bin)

2013-02-24 18:36:01

它们不是一回事——UTF-8是编码Unicode的一种特殊方式。

根据您的应用程序和您打算使用的数据，有许多不同的编码可供选择。据我所知，最常见的是UTF-8、UTF-16和UTF-32。

2009-03-13 17:09:23

你通常从谷歌开始，然后想尝试不同的东西。但是如何打印和转换所有这些字符集呢?

这里我列出了一些有用的一行程序。

Powershell:

# Print character with the Unicode point (U+<hexcode>) using this: 
[char]0x2550

# With Python installed, you can print the unicode character from U+xxxx with:
python -c 'print(u"\u2585")'

如果你有更多的Powershell trix或快捷方式，请评论。

在Bash中，你会喜欢libiconv和util-linux包中的iconv、hexdump和xxd(可能在其他*nix发行版中命名不同)。

# To print the 3-byte hex code for a Unicode character:
printf "\\\x%s" $(printf '═'|xxd -p -c1 -u)
#\xE2\x95\x90

# To print the Unicode character represented by hex string:
printf '\xE2\x96\x85'
#▅

# To convert from UTF-16LE to Unicode
echo -en "════"| iconv -f UTF-16LE -t UNICODEFFFE

# To convert a string into hex: 
echo -en '═�'| xxd -g 1
#00000000: e2 95 90 ef bf bd

# To convert a string into binary:
echo -en '═�\n'| xxd -b
#00000000: 11100010 10010101 10010000 11101111 10111111 10111101  ......
#00000006: 00001010

# To convert a binary string into hex:
printf  '%x\n' "$((2#111000111000000110000010))"
#e38182

2022-01-04 14:50:54

UTF-8是一种使用8位序列编码Unicode字符的方法。

Unicode是一种用于表示来自多种语言的各种字符的标准。

2018-01-26 13:35:55

Unicode只是一个标准，它定义了一个字符集(UCS)和编码(UTF)来编码这个字符集。但一般来说，Unicode指的是字符集，而不是标准。

在5分钟内阅读每个软件开发人员绝对必须知道的关于Unicode和字符集(没有借口!)和Unicode的绝对最小值。

2009-03-13 17:37:07

UTF-8和Unicode有什么区别?

推荐文章

最新文章

标签