在Bash中提取子字符串

给定someletters_12345_moreleters形式的文件名。ext，我想提取5位数字，并将它们放入一个变量。

为了强调这一点，我有一个x个字符的文件名，然后是一个5位数字序列，两边都有一个下划线，然后是另一组x个字符。我想把这个5位数代入一个变量。

我对实现这一目标的多种不同方式非常感兴趣。

当前回答

试着用cut -c startindex - stopindx

2010-09-22 17:54:15

其他回答

没有任何子过程，您可以:

shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

一个非常小的变体也可以在ksh93中工作。

2009-01-09 16:13:38

以下是我的做法:

FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}

解释:

Bash-specific:

[[]]为条件表达式 =~表示条件为正则表达式如果前一个命令成功，&&将链接这些命令

正则表达式(RE): _([[:digit:]]{5})_

_是字面量，用于为被匹配的字符串划分/锚定匹配边界 ()创建捕获组 [[:digit:]]是一个字符类，我认为它不言自明 {5}表示前面的字符中的恰好五个，类(如本例中所示)或组必须匹配

In english, you can think of it behaving like this: the FN string is iterated character by character until we see an _ at which point the capture group is opened and we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _, the condition is successful, the capture group is made available in BASH_REMATCH, and the next NUM= statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _. e.g. if FN where _1 _12 _123 _1234 _12345_, there would be four false starts before it found a match.

2009-01-12 19:43:20

基于jor的回答(这对我来说并不适用):

substring=$(expr "$filename" : '.*_\([^_]*\)_.*')

2009-01-09 15:41:11

我喜欢sed处理正则表达式组的能力:

> var="someletters_12345_moreletters.ext"
> digits=$( echo "$var" | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

一个更一般的选择是不要假设你用下划线_标记你的数字序列的开始，因此例如剥离你在你的序列之前得到的所有非数字:s/[^0-9]\+\([0-9]\+\).*/\1/p。

> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

更多关于这一点，以防你对regexp不太自信:

S代表_s_substitute [0-9]+匹配1+数字 \1链接到正则表达式输出的组n.1(组0是整个匹配，组1是括号内的匹配) P标志为_p_printing

所有转义\都是为了使sed的regexp处理工作。

2016-10-21 08:12:04

我很惊讶这个纯粹的bash解决方案没有出现:

a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

您可能希望将IFS重置为之前的值，或者在之后取消设置IFS !

2013-06-03 17:34:40

在Bash中提取子字符串

推荐文章

最新文章

标签