给定someletters_12345_moreleters形式的文件名。ext,我想提取5位数字,并将它们放入一个变量。

为了强调这一点,我有一个x个字符的文件名,然后是一个5位数字序列,两边都有一个下划线,然后是另一组x个字符。我想把这个5位数代入一个变量。

我对实现这一目标的多种不同方式非常感兴趣。


当前回答

遵循要求

我有一个文件名,x个字符,然后是5位数字 序列两侧分别用一个下划线包围 x个字符的集合。我想取一个5位数 把它代入一个变量。

我发现了一些可能有用的grep方法:

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 
12345

或更好的

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 
12345

然后使用-Po语法:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 
12345

或者如果你想让它正好适合5个字符:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 
12345

最后,要将它存储在一个变量中,只需要使用var=$(命令)语法。

其他回答

减少使用:

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

更通用的:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

外壳切割-从字符串中打印特定范围的字符或给定部分

#method1)使用bash

 str=2020-08-08T07:40:00.000Z
 echo ${str:11:8}

#方法2)使用cut

 str=2020-08-08T07:40:00.000Z
 cut -c12-19 <<< $str

#method3)当使用awk时

 str=2020-08-08T07:40:00.000Z
 awk '{time=gensub(/.{11}(.{8}).*/,"\\1","g",$1); print time}' <<< $str

您可以使用参数展开来做到这一点。

如果a为常数,则下面的参数展开执行子字符串提取:

b=${a:12:5}

12是偏移量(从零开始),5是长度

如果数字周围的下划线是输入中唯一的下划线,您可以分两步分别去掉前缀和后缀:

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

如果有其他下划线,那么无论如何都可能是可行的,尽管比较棘手。如果有人知道如何在一个表达式中执行两个展开,我也想知道。

提出的两个解决方案都是纯bash,不涉及进程生成,因此非常快。

如果有人想要更严格的信息,你也可以像这样在man bash中搜索

$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]

结果:

${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.

我喜欢sed处理正则表达式组的能力:

> var="someletters_12345_moreletters.ext"
> digits=$( echo "$var" | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

一个更一般的选择是不要假设你用下划线_标记你的数字序列的开始,因此例如剥离你在你的序列之前得到的所有非数字:s/[^0-9]\+\([0-9]\+\).*/\1/p。


> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

更多关于这一点,以防你对regexp不太自信:

S代表_s_substitute [0-9]+匹配1+数字 \1链接到正则表达式输出的组n.1(组0是整个匹配,组1是括号内的匹配) P标志为_p_printing

所有转义\都是为了使sed的regexp处理工作。