如何在Bash中回显4位Unicode字符?

我想将Unicode头骨和交叉骨头添加到我的shell提示符(特别是' skull and crossbones ' (U+2620))，但我不知道让echo吐出它的魔法咒语，或任何其他4位Unicode字符。两位数的数字很简单。例如echo -e "\x55"，。

除了下面的答案之外，还应该注意到，很明显，您的终端需要支持Unicode以使输出符合您的期望。Gnome-terminal在这方面做得很好，但它在默认情况下不一定是打开的。

在macOS的终端应用程序中，选择首选项->编码，并选择Unicode (UTF-8)。

只需在shell脚本中输入“☠”。在正确的地区和支持unicode的控制台上，它可以正常打印:

$ echo ☠
☠
$

一个丑陋的“变通方法”是输出UTF-8序列，但这也取决于所使用的编码:

$ echo -e '\xE2\x98\xA0'
☠
$

2009-03-02 16:16:31

在UTF-8中，它实际上是6个数字(或3个字节)。

$ printf '\xE2\x98\xA0'
☠

要检查它是如何被控制台编码的，使用hexdump:

$ printf ☠ | hexdump
0000000 98e2 00a0                              
0000003

2009-03-02 16:16:43

快速一行代码将UTF-8字符转换为3字节格式:

var="$(echo -n '☠' | od -An -tx1)"; printf '\\x%s' ${var^^}; echo

echo -n '☠' | od -An -tx1 | sed 's/ /\\x/g'

两者的输出都是\xE2\x98\xA0，所以你可以反过来写:

echo $'\xe2\x98\xa0'   # ☠

2011-04-22 21:48:29

只要您的文本编辑器能够处理Unicode(假定以UTF-8编码)，您就可以直接输入Unicode代码点。

例如，在Vim文本编辑器中，您可以进入插入模式并按Ctrl + V + U，然后按4位十六进制数(必要时可以用零填充)。输入Ctrl + V + U 2 6 20。参见:在文档中插入Unicode字符的最简单方法是什么?

在运行Bash的终端上，您可以键入CTRL+SHIFT+U，并键入所需字符的十六进制码位。在输入过程中，你的光标应该显示一个带下划线的u。你输入的第一个非数字结束输入，并呈现字符。所以你可以在Bash中使用以下方法打印U+2620:

echo CTRL + SHIFT + U2620ENTERENTER

(第一个输入结束Unicode输入，第二个执行echo命令。)

来源:Ubuntu SE

2011-05-10 19:45:58

您可能需要将代码点编码为八进制，以便快速展开以正确解码它。

U+2620编码为UTF-8为E2 98 A0。

在Bash中，

export PS1="\342\230\240"

会让你的壳迅速变成头骨和骨头。

2013-04-09 13:45:12

这是一个完全内部的Bash实现，没有分叉，Unicode字符大小不限。

fast_chr() {
    local __octal
    local __char
    printf -v __octal '%03o' $1
    printf -v __char \\$__octal
    REPLY=$__char
}

function unichr {
    local c=$1    # Ordinal of char
    local l=0    # Byte ctr
    local o=63    # Ceiling
    local p=128    # Accum. bits
    local s=''    # Output string

    (( c < 0x80 )) && { fast_chr "$c"; echo -n "$REPLY"; return; }

    while (( c > o )); do
        fast_chr $(( t = 0x80 | c & 0x3f ))
        s="$REPLY$s"
        (( c >>= 6, l++, p += o+1, o>>=1 ))
    done

    fast_chr $(( t = p | c ))
    echo -n "$REPLY$s"
}

## test harness
for (( i=0x2500; i<0x2600; i++ )); do
    unichr $i
done

输出是:

─━│┃┄┅┆┇┈┉┊┋┌┍┎┏
┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯
┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏
═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯
╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏
▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯
▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●
◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯
◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿

2013-05-12 16:12:14

我用这个:

$ echo -e '\u2620'
☠

这比搜索十六进制表示法要简单得多。我在我的shell脚本中使用它。这适用于侏儒术语和urxvt AFAIK。

2013-12-01 08:55:59

根据Stack Overflow问题Unix cut，删除第一个令牌和https://stackoverflow.com/a/15903654/781312:

(octal=$(echo -n ☠ | od -t o1 | head -1 | cut -d' ' -f2- | sed -e 's#\([0-9]\+\) *#\\0\1#g')
echo Octal representation is following $octal
echo -e "$octal")

输出如下。

Octal representation is following \0342\0230\0240
☠

2014-05-02 20:01:17

这三个命令中的任何一个都可以在控制台中打印你想要的字符，前提是控制台接受UTF-8字符(大多数当前的都接受):

echo -e "SKULL AND CROSSBONES (U+2620) \U02620"
echo $'SKULL AND CROSSBONES (U+2620) \U02620'
printf "%b" "SKULL AND CROSSBONES (U+2620) \U02620\n"

SKULL AND CROSSBONES (U+2620) ☠

之后，您可以将实际的字形(图像、字符)复制并粘贴到任何文本编辑器(启用UTF-8)。

如果你需要看到这样的Unicode Code Point是如何用UTF-8编码的，使用xxd(比od更好的十六进制查看器):

echo $'(U+2620) \U02620' | xxd
0000000: 2855 2b32 3632 3029 20e2 98a0 0a         (U+2620) ....

That means that the UTF8 encoding is: e2 98 a0

或者，在HEX中避免错误:0xE2 0x98 0xA0。也就是说，空格(HEX 20)和换行(HEX 0A)之间的值。

如果您想深入了解数字到字符的转换:请查看Greg的wiki (BashFAQ)中关于Bash中的ASCII编码的文章!

2015-02-18 04:44:11

printf内置(就像coreutils的printf一样)知道接受4位Unicode字符的\u转义序列:

   \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)

使用Bash 4.2.37(1)测试:

$ printf '\u2620\n'
☠

2015-04-18 23:32:41

如果您不介意Perl一行代码:

$ perl -CS -E 'say "\x{2620}"'
☠

-CS支持输入UTF-8解码和输出UTF-8编码。-E将下一个参数计算为Perl，启用了say等现代特性。如果你不想在结尾使用换行符，使用print而不是say。

2016-10-17 16:23:55

如果已知unicode字符的十六进制值

H="2620"
printf "%b" "\u$H"

如果已知unicode字符的十进制值

declare -i U=2*4096+6*256+2*16
printf -vH "%x" $U              # convert to hex
printf "%b" "\u$H"

2017-07-20 11:26:40

简单地使用Python2/3一行代码:

$ python -c 'print u"\u2620"'    # python2
$ python3 -c 'print(u"\u2620")'  # python3

结果:

☠

2017-10-26 19:49:05

对不起，我又提了这个老问题。但是在使用bash时，有一种非常简单的方法可以从纯ASCII输入创建Unicode码点，甚至根本没有分叉:

unicode() { local -n a="$1"; local c; printf -vc '\\U%08x' "$2"; printf -va "$c"; }
unicodes() { local a c; for a; do printf -vc '\\U%08x' "$a"; printf "$c"; done; };

如下所示使用它来定义某些代码点

unicode crossbones 0x2620
echo "$crossbones"

或者将前65536个unicode码点转储到stdout(在我的机器上耗时不到2秒。额外的空格是为了防止某些字符由于shell的monospace字体而相互流入):

for a in {0..65535}; do unicodes "$a"; printf ' '; done

或者讲一个非常典型的父母的故事(这需要Unicode 2010):

unicodes 0x1F6BC 32 43 32 0x1F62D 32 32 43 32 0x1F37C 32 61 32 0x263A 32 32 43 32 0x1F4A9 10

解释:

printf '\UXXXXXXXX' prints out any Unicode character printf '\\U%08x' number prints \UXXXXXXXX with the number converted to Hex, this then is fed to another printf to actually print out the Unicode character printf recognizes octal (0oct), hex (0xHEX) and decimal (0 or numbers starting with 1 to 9) as numbers, so you can choose whichever representation fits best printf -v var .. gathers the output of printf into a variable, without fork (which tremendously speeds up things) local variable is there to not pollute the global namespace local -n var=other aliases var to other, such that assignment to var alters other. One interesting part here is, that var is part of the local namespace, while other is part of the global namespace. Please note that there is no such thing as local or global namespace in bash. Variables are kept in the environment, and such are always global. Local just puts away the current value and restores it when the function is left again. Other functions called from within the function with local will still see the "local" value. This is a fundamentally different concept than all the normal scoping rules found in other languages (and what bash does is very powerful but can lead to errors if you are a programmer who is not aware of that).

2018-03-17 10:04:43

在bash中打印Unicode字符输出使用\x，\u或\u(第一个用于2位十六进制，第二个用于4位十六进制，第三个用于任何长度)

echo -e '\U1f602'

如果你想把它赋值给一个变量，使用$'…的语法

x=$'\U1f602'
echo $x

2018-06-08 15:05:29

在Bash中:

UnicodePointToUtf8()
{
    local x="$1"               # ok if '0x2620'
    x=${x/\\u/0x}              # '\u2620' -> '0x2620'
    x=${x/U+/0x}; x=${x/u+/0x} # 'U-2620' -> '0x2620'
    x=$((x)) # from hex to decimal
    local y=$x n=0
    [ $x -ge 0 ] || return 1
    while [ $y -gt 0 ]; do y=$((y>>1)); n=$((n+1)); done
    if [ $n -le 7 ]; then       # 7
        y=$x
    elif [ $n -le 11 ]; then    # 5+6
        y=" $(( ((x>> 6)&0x1F)+0xC0 )) \
            $(( (x&0x3F)+0x80 ))" 
    elif [ $n -le 16 ]; then    # 4+6+6
        y=" $(( ((x>>12)&0x0F)+0xE0 )) \
            $(( ((x>> 6)&0x3F)+0x80 )) \
            $(( (x&0x3F)+0x80 ))"
    else                        # 3+6+6+6
        y=" $(( ((x>>18)&0x07)+0xF0 )) \
            $(( ((x>>12)&0x3F)+0x80 )) \
            $(( ((x>> 6)&0x3F)+0x80 )) \
            $(( (x&0x3F)+0x80 ))"
    fi
    printf -v y '\\x%x' $y
    echo -n -e $y
}

# test
for (( i=0x2500; i<0x2600; i++ )); do
    UnicodePointToUtf8 $i
    [ "$(( i+1 & 0x1f ))" != 0 ] || echo ""
done
x='U+2620'
echo "$x -> $(UnicodePointToUtf8 $x)"

输出:

─━│┃┄┅┆┇┈┉┊┋┌┍┎┏┐┑┒┓└┕┖┗┘┙┚┛├┝┞┟
┠┡┢┣┤┥┦┧┨┩┪┫┬┭┮┯┰┱┲┳┴┵┶┷┸┹┺┻┼┽┾┿
╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟
╠╡╢╣╤╥╦╧╨╩╪╫╬╭╮╯╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿
▀▁▂▃▄▅▆▇█▉▊▋▌▍▎▏▐░▒▓▔▕▖▗▘▙▚▛▜▝▞▟
■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯▰▱▲△▴▵▶▷▸▹►▻▼▽▾▿
◀◁◂◃◄◅◆◇◈◉◊○◌◍◎●◐◑◒◓◔◕◖◗◘◙◚◛◜◝◞◟
◠◡◢◣◤◥◦◧◨◩◪◫◬◭◮◯◰◱◲◳◴◵◶◷◸◹◺◻◼◽◾◿
U+2620 -> ☠

2019-11-25 21:01:30

如何在Bash中回显4位Unicode字符?

推荐文章

最新文章

标签