bash中有没有将字符串转换为小写字符串的方法?
例如,如果我有:
a="Hi all"
我想将其转换为:
"hi all"
bash中有没有将字符串转换为小写字符串的方法?
例如,如果我有:
a="Hi all"
我想将其转换为:
"hi all"
当前回答
基于Dejay Clayton出色的解决方案,我将大写/小写扩展为转置函数(独立有用),在变量中返回结果(更快/更安全),并添加了BASH v4+优化:
pkg::transpose() { # <retvar> <string> <from> <to>
local __r=$2 __m __p
while [[ ${__r} =~ ([$3]) ]]; do
__m="${BASH_REMATCH[1]}"; __p="${3%${__m}*}"
__r="${__r//${__m}/${4:${#__p}:1}}"
done
printf -v "$1" "%s" "${__r}"
}
pkg::lowercase() { # <retvar> <string>
if (( BASH_VERSINFO[0] >= 4 )); then
printf -v "$1" "%s" "${2,,}"
else
pkg::transpose "$1" "$2" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" \
"abcdefghijklmnopqrstuvwxyz"
fi
}
pkg::uppercase() { # <retvar> <string>
if (( BASH_VERSINFO[0] >= 4 )); then
printf -v "$1" "%s" "${2^^}"
else
pkg::transpose "$1" "$2" "abcdefghijklmnopqrstuvwxyz" \
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
fi
}
为了简单起见,我没有添加任何set-e支持(或任何错误检查)。。。但除此之外,它通常遵循shellguide和pkg::transpose()试图避免printf-v的任何可能的变量名冲突
其他回答
从bash手册页:
${参数^模式}${参数^^模式}${参数,模式}${参数,,模式}案例修改。此扩展修改参数中字母字符的大小写。该模式被扩展以生成模式,就像路径名扩展一样。展开的每个字符根据模式测试参数的值,如果匹配模式,其大小写被转换。该模式不应尝试匹配多个字符。^运算符转换为小写字母匹配模式到大写;运算符转换将大写字母匹配为小写字母。^^和,,展开式转换展开值中的每个匹配字符;这个^并且,展开式仅匹配并转换展开值中的第一个字符。如果省略了模式,则将其视为?, 它匹配每个字符。如果参数为@或*,则大小写修改操作将依次应用于每个位置参数,展开即为结果列表。如果参数是下标为@或*的数组变量,大小写修改操作应用于旋转,展开即为结果列表。
许多答案使用外部程序,而不是真正使用Bash。
如果你知道你将有Bash4可用,你真的应该使用${VAR,,}符号(这很简单,很酷)。对于Bash before 4(例如,我的Mac仍然使用Bash 3.2)。我使用@ghostdog74的答案的修正版本创建了一个更便携的版本。
一个可以称为小写的“我的字符串”并获得小写版本。我阅读了关于将结果设置为var的评论,但这在Bash中不是真正可移植的,因为我们不能返回字符串。打印它是最好的解决方案。使用类似于var=“$(小写$str)”的内容很容易捕获。
这是如何工作的
其工作方式是通过printf获取每个字符的ASCII整数表示,然后如果上到->下加32,或者如果下到->上减32。然后再次使用printf将数字转换回字符。从“A”到->“A”,我们有32个字符的差异。
使用printf解释:
$ printf "%d\n" "'a"
97
$ printf "%d\n" "'A"
65
97 - 65 = 32
这是带有示例的工作版本。请注意代码中的注释,因为它们解释了很多内容:
#!/bin/bash
# lowerupper.sh
# Prints the lowercase version of a char
lowercaseChar(){
case "$1" in
[A-Z])
n=$(printf "%d" "'$1")
n=$((n+32))
printf \\$(printf "%o" "$n")
;;
*)
printf "%s" "$1"
;;
esac
}
# Prints the lowercase version of a sequence of strings
lowercase() {
word="$@"
for((i=0;i<${#word};i++)); do
ch="${word:$i:1}"
lowercaseChar "$ch"
done
}
# Prints the uppercase version of a char
uppercaseChar(){
case "$1" in
[a-z])
n=$(printf "%d" "'$1")
n=$((n-32))
printf \\$(printf "%o" "$n")
;;
*)
printf "%s" "$1"
;;
esac
}
# Prints the uppercase version of a sequence of strings
uppercase() {
word="$@"
for((i=0;i<${#word};i++)); do
ch="${word:$i:1}"
uppercaseChar "$ch"
done
}
# The functions will not add a new line, so use echo or
# append it if you want a new line after printing
# Printing stuff directly
lowercase "I AM the Walrus!"$'\n'
uppercase "I AM the Walrus!"$'\n'
echo "----------"
# Printing a var
str="A StRing WITH mixed sTUFF!"
lowercase "$str"$'\n'
uppercase "$str"$'\n'
echo "----------"
# Not quoting the var should also work,
# since we use "$@" inside the functions
lowercase $str$'\n'
uppercase $str$'\n'
echo "----------"
# Assigning to a var
myLowerVar="$(lowercase $str)"
myUpperVar="$(uppercase $str)"
echo "myLowerVar: $myLowerVar"
echo "myUpperVar: $myUpperVar"
echo "----------"
# You can even do stuff like
if [[ 'option 2' = "$(lowercase 'OPTION 2')" ]]; then
echo "Fine! All the same!"
else
echo "Ops! Not the same!"
fi
exit 0
运行此操作后的结果:
$ ./lowerupper.sh
i am the walrus!
I AM THE WALRUS!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
a string with mixed stuff!
A STRING WITH MIXED STUFF!
----------
myLowerVar: a string with mixed stuff!
myUpperVar: A STRING WITH MIXED STUFF!
----------
Fine! All the same!
不过,这只适用于ASCII字符。
对我来说,这很好,因为我知道我只会向它传递ASCII字符。例如,我在某些不区分大小写的CLI选项中使用此选项。
我知道这是一篇老掉牙的帖子,但我为另一个网站做了这个回答,所以我想我会把它贴在这里:
上部->下部:使用python:
b=`echo "print '$a'.lower()" | python`
或Ruby:
b=`echo "print '$a'.downcase" | ruby`
或Perl:
b=`perl -e "print lc('$a');"`
或PHP:
b=`php -r "print strtolower('$a');"`
或者Awk:
b=`echo "$a" | awk '{ print tolower($1) }'`
或Sed:
b=`echo "$a" | sed 's/./\L&/g'`
或Bash 4:
b=${a,,}
或NodeJS:
b=`node -p "\"$a\".toLowerCase()"`
您也可以使用dd:
b=`echo "$a" | dd conv=lcase 2> /dev/null`
下部->上部:
使用python:
b=`echo "print '$a'.upper()" | python`
或Ruby:
b=`echo "print '$a'.upcase" | ruby`
或Perl:
b=`perl -e "print uc('$a');"`
或PHP:
b=`php -r "print strtoupper('$a');"`
或者Awk:
b=`echo "$a" | awk '{ print toupper($1) }'`
或Sed:
b=`echo "$a" | sed 's/./\U&/g'`
或Bash 4:
b=${a^^}
或NodeJS:
b=`node -p "\"$a\".toUpperCase()"`
您也可以使用dd:
b=`echo "$a" | dd conv=ucase 2> /dev/null`
另外,当你说“shell”时,我假设你是指bash,但如果你能使用zsh,那就很简单了
b=$a:l
对于小写和
b=$a:u
用于大写。
对于Bash命令行,根据语言环境和国际字母,这可能会起作用(根据其他人的答案组合而成):
$ echo "ABCÆØÅ" | python -c "print(open(0).read().lower())"
abcæøå
$ echo "ABCÆØÅ" | sed 's/./\L&/g'
abcæøå
$ export a="ABCÆØÅ" | echo "${a,,}"
abcæøå
尽管这些变化可能不起作用:
$ echo "ABCÆØÅ" | tr "[:upper:]" "[:lower:]"
abcÆØÅ
$ echo "ABCÆØÅ" | awk '{print tolower($1)}'
abcÆØÅ
$ echo "ABCÆØÅ" | perl -ne 'print lc'
abcÆØÅ
$ echo 'ABCÆØÅ' | dd conv=lcase 2> /dev/null
abcÆØÅ
因此,我尝试对每个实用程序使用共识方法执行一些更新的基准测试,但我没有多次重复一个小集合,而是。。。
以UTF-8编码的多字节Unicode字符填充到边缘的1.85 GB.txt文件中,为了均衡I/O方面,同时还强制所有人使用LC_ALL=C,以确保公平竞争
————————————————————————————————————————
准确地说,bsd-sed和gnu-sed都相当平庸。我甚至不知道bsd sed在做什么,因为他们的xxhash不匹配python3是否试图使用Unicode字母大小写?(即使我已经强制设置了区域设置LC_ALL=C)tr是最极端的到目前为止,gnutr是最快的bsd tr非常残暴perl5比我拥有的任何awk变体都快,除非你可以使用mawk2一次加载整个文件,以便稍微超过perl5:2.935秒mawk2对每15秒3.081秒在awk中,gnu-gawk的速度最慢,中间是mawk 1.3.4,最快是mawk1.9.9.6:比gawk节省50%以上的时间.(我没有把时间浪费在无用的macosx nawk上)
.
out9: 1.85GiB 0:00:03 [ 568MiB/s] [ 568MiB/s] [ <=> ]
in0: 1.85GiB 0:00:03 [ 568MiB/s] [ 568MiB/s] [============>] 100%
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C mawk2 '{ print tolower($_) }' FS='^$'; )
mawk 1.9.9.6 (mawk2-beta)
3.07s user 0.66s system 111% cpu 3.348 total
85759a34df874966d096c6529dbfb9d5 stdin
out9: 1.85GiB 0:00:06 [ 297MiB/s] [ 297MiB/s] [ <=> ]
in0: 1.85GiB 0:00:06 [ 297MiB/s] [ 297MiB/s] [============>] 100%
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C mawk '{ print tolower($_) }' FS='^$'; )
mawk 1.3.4
6.01s user 0.83s system 107% cpu 6.368 total
85759a34df874966d096c6529dbfb9d5 stdin
out9: 23.8MiB 0:00:00 [ 238MiB/s] [ 238MiB/s] [ <=> ]
in0: 1.85GiB 0:00:07 [ 244MiB/s] [ 244MiB/s] [============>] 100%
out9: 1.85GiB 0:00:07 [ 244MiB/s] [ 244MiB/s] [ <=> ]
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C gawk -be '{ print tolower($_) }' FS='^$';
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
7.49s user 0.78s system 106% cpu 7.763 total
85759a34df874966d096c6529dbfb9d5 stdin
out9: 1.85GiB 0:00:03 [ 616MiB/s] [ 616MiB/s] [ <=> ]
in0: 1.85GiB 0:00:03 [ 617MiB/s] [ 617MiB/s] [============>] 100%
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C perl -ne 'print lc'; )
perl5 (revision 5 version 34 subversion 0)
2.70s user 0.85s system 115% cpu 3.081 total
85759a34df874966d096c6529dbfb9d5 stdin
out9: 1.85GiB 0:00:32 [57.4MiB/s] [57.4MiB/s] [ <=> ]
in0: 1.85GiB 0:00:32 [57.4MiB/s] [57.4MiB/s] [============>] 100%
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C gsed 's/.*/\L&/'; ) # GNU-sed
gsed (GNU sed) 4.8
32.57s user 0.97s system 101% cpu 32.982 total
85759a34df874966d096c6529dbfb9d5 stdin
out9: 1.86GiB 0:00:38 [49.7MiB/s] [49.7MiB/s] [ <=> ]
in0: 1.85GiB 0:00:38 [49.4MiB/s] [49.4MiB/s] [============>] 100%
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C sed 's/.*/\L&/'; ) # BSD-sed
37.94s user 0.86s system 101% cpu 38.318 total
d5e2d8487df1136db7c2334a238755c0 stdin
in0: 313MiB 0:00:00 [3.06GiB/s] [3.06GiB/s] [=====>] 16% ETA 0:00:00
out9: 1.85GiB 0:00:11 [ 166MiB/s] [ 166MiB/s] [ <=>]
in0: 1.85GiB 0:00:00 [3.31GiB/s] [3.31GiB/s] [============>] 100%
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C python3 -c "print(open(0).read().lower()))
Python 3.9.12
9.04s user 2.18s system 98% cpu 11.403 total
7ddc0b5cbcfbbfac3c2b6da6731bd262 stdin
out9: 2.51MiB 0:00:00 [25.1MiB/s] [25.1MiB/s] [ <=> ]
in0: 1.85GiB 0:00:11 [ 171MiB/s] [ 171MiB/s] [============>] 100%
out9: 1.85GiB 0:00:11 [ 171MiB/s] [ 171MiB/s] [ <=> ]
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C ruby -pe '$_.downcase!'; )
ruby 2.6.8p205 (2021-07-07 revision 67951) [universal.arm64e-darwin21]
10.46s user 1.23s system 105% cpu 11.073 total
85759a34df874966d096c6529dbfb9d5 stdin
in0: 1.85GiB 0:00:01 [1.01GiB/s] [1.01GiB/s] [============>] 100%
out9: 1.85GiB 0:00:01 [1.01GiB/s] [1.01GiB/s] [ <=> ]
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C gtr '[A-Z]' '[a-z]'; ) # GNU-tr
gtr (GNU coreutils) 9.1
1.11s user 1.21s system 124% cpu 1.855 total
85759a34df874966d096c6529dbfb9d5 stdin
out9: 1.85GiB 0:01:19 [23.7MiB/s] [23.7MiB/s] [ <=> ]
in0: 1.85GiB 0:01:19 [23.7MiB/s] [23.7MiB/s] [============>] 100%
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C tr '[A-Z]' '[a-z]'; ) # BSD-tr
78.94s user 1.50s system 100% cpu 1:19.67 total
85759a34df874966d096c6529dbfb9d5 stdin
( time ( pvE0 < "${m3t}" | LC_ALL=C gdd conv=lcase ) | pvE9 ) | xxh128sum | lgp3; sleep 3;
out9: 0.00 B 0:00:01 [0.00 B/s] [0.00 B/s] [<=> ]
in0: 1.85GiB 0:00:06 [ 295MiB/s] [ 295MiB/s] [============>] 100%
out9: 1.81GiB 0:00:06 [ 392MiB/s] [ 294MiB/s] [ <=> ]
3874110+1 records in
3874110+1 records out
out9: 1.85GiB 0:00:06 [ 295MiB/s] [ 295MiB/s] [ <=> ]
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C gdd conv=lcase; ) # GNU-dd
gdd (coreutils) 9.1
1.93s user 4.35s system 97% cpu 6.413 total
85759a34df874966d096c6529dbfb9d5 stdin
% ( time ( pvE0 < "${m3t}" | LC_ALL=C dd conv=lcase ) | pvE9 ) | xxh128sum | lgp3; sleep 3;
out9: 36.9MiB 0:00:00 [ 368MiB/s] [ 368MiB/s] [ <=> ]
in0: 1.85GiB 0:00:04 [ 393MiB/s] [ 393MiB/s] [============>] 100%
out9: 1.85GiB 0:00:04 [ 393MiB/s] [ 393MiB/s] [ <=> ]
3874110+1 records in
3874110+1 records out
out9: 1.85GiB 0:00:04 [ 393MiB/s] [ 393MiB/s] [ <=> ]
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C dd conv=lcase; ) # BSD-dd
1.92s user 4.24s system 127% cpu 4.817 total
85759a34df874966d096c6529dbfb9d5 stdin
————————————————————————————————————————
通过一次加载所有文件,并在单个函数调用中对所有1.85 GB执行tolower(),可以人为地使mawk2比perl5更快::
( time ( pvE0 < "${m3t}" |
LC_ALL=C mawk2 '
BEGIN { FS = RS = "^$" }
END { print tolower($(ORS = "")) }'
) | pvE9 ) | xxh128sum| lgp3
in0: 1.85GiB 0:00:00 [3.35GiB/s] [3.35GiB/s] [============>] 100%
out9: 1.85GiB 0:00:02 [ 647MiB/s] [ 647MiB/s] [ <=> ]
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C mawk2 ; )
1.39s user 1.31s system 91% cpu 2.935 total
85759a34df874966d096c6529dbfb9d5 stdin