在Bash中提取子字符串

给定someletters_12345_moreleters形式的文件名。ext，我想提取5位数字，并将它们放入一个变量。

为了强调这一点，我有一个x个字符的文件名，然后是一个5位数字序列，两边都有一个下划线，然后是另一组x个字符。我想把这个5位数代入一个变量。

我对实现这一目标的多种不同方式非常感兴趣。

减少使用:

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

更通用的:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

2009-01-09 13:56:14

通用解决方案，其中数字可以在文件名中的任何位置，使用这样的序列中的第一个:

number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)

另一个精确提取变量一部分的解决方案:

number=${filename:offset:length}

如果你的文件名总是使用stuff_digits_…你可以使用awk:

number=$(echo $filename | awk -F _ '{ print $2 }')

还有一种方法可以删除除数字以外的所有内容，使用

number=$(echo $filename | tr -cd '[[:digit:]]')

2009-01-09 14:00:08

还有bash内置的'expr'命令:

INPUT="someletters_12345_moreleters.ext"  
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `  
echo $SUBSTRING

2009-01-09 15:01:02

基于jor的回答(这对我来说并不适用):

substring=$(expr "$filename" : '.*_\([^_]*\)_.*')

2009-01-09 15:41:11

您可以使用参数展开来做到这一点。

如果a为常数，则下面的参数展开执行子字符串提取:

b=${a:12:5}

12是偏移量(从零开始)，5是长度

如果数字周围的下划线是输入中唯一的下划线，您可以分两步分别去掉前缀和后缀:

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

如果有其他下划线，那么无论如何都可能是可行的，尽管比较棘手。如果有人知道如何在一个表达式中执行两个展开，我也想知道。

提出的两个解决方案都是纯bash，不涉及进程生成，因此非常快。

2009-01-09 15:52:35

没有任何子过程，您可以:

shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

一个非常小的变体也可以在ksh93中工作。

2009-01-09 16:13:38

以下是我的做法:

FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}

解释:

Bash-specific:

[[]]为条件表达式 =~表示条件为正则表达式如果前一个命令成功，&&将链接这些命令

正则表达式(RE): _([[:digit:]]{5})_

_是字面量，用于为被匹配的字符串划分/锚定匹配边界 ()创建捕获组 [[:digit:]]是一个字符类，我认为它不言自明 {5}表示前面的字符中的恰好五个，类(如本例中所示)或组必须匹配

In english, you can think of it behaving like this: the FN string is iterated character by character until we see an _ at which point the capture group is opened and we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _, the condition is successful, the capture group is made available in BASH_REMATCH, and the next NUM= statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _. e.g. if FN where _1 _12 _123 _1234 _12345_, there would be four false starts before it found a match.

2009-01-12 19:43:20

试着用cut -c startindex - stopindx

2010-09-22 17:54:15

下面是一个前缀后缀解决方案(类似于JB和Darron给出的解决方案)，它匹配第一个数字块，并且不依赖于周围的下划线:

str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}"   # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}"            # strip off non-digit suffix from s1
echo "$s2"                           # 12345

2011-05-06 12:50:13

如果有人想要更严格的信息，你也可以像这样在man bash中搜索

$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]

结果:

${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.

2013-05-31 15:00:54

我很惊讶这个纯粹的bash解决方案没有出现:

a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

您可能希望将IFS重置为之前的值，或者在之后取消设置IFS !

2013-06-03 17:34:40

类似于php中的substr('abcdefg'， 2-1, 3):

echo 'abcdefg'|tail -c +2|head -c 3

2013-06-26 11:34:08

遵循要求

我有一个文件名，x个字符，然后是5位数字序列两侧分别用一个下划线包围 x个字符的集合。我想取一个5位数把它代入一个变量。

我发现了一些可能有用的grep方法:

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 
12345

或更好的

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 
12345

然后使用-Po语法:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 
12345

或者如果你想让它正好适合5个字符:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 
12345

最后，要将它存储在一个变量中，只需要使用var=$(命令)语法。

2013-06-26 12:13:49

有点晚了，但我刚刚遇到了这个问题，并发现了以下内容:

host:/tmp$ asd=someletters_12345_moreleters.ext 
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$

我用它在一个没有%N日期的嵌入式系统上获得毫秒分辨率:

set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction

2013-08-01 08:12:33

如果我们关注以下概念: 一串(一个或几个)数字。

我们可以使用一些外部工具来提取这些数字。我们可以很容易地擦除所有其他字符，无论是sed还是tr:

name='someletters_12345_moreleters.ext'

echo $name | sed 's/[^0-9]*//g'    # 12345
echo $name | tr -c -d 0-9          # 12345

但如果$name包含几组数字，则上述操作将失败:

如果“name = someletters_12345_moreleters_323_end。ext”,那么:

echo $name | sed 's/[^0-9]*//g'    # 12345323
echo $name | tr -c -d 0-9          # 12345323

我们需要使用正则表达式。在sed和perl中只选择第一次运行(12345而不是323):

echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/'
perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'

但我们也可以直接在bash(1)中执行:

regex=[^0-9]*([0-9]{1,}).*$; \
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}

这允许我们提取任意长度的数字的第一行被其他文本/字符包围。

注意:正则表达式=[^ 0 - 9]*([0 - 9]{5,5})。*美元;将只匹配精确的5位数运行。:-）

(1):每段短文本都比调用外部工具快。并不比在sed或awk中处理大文件快。

2014-08-05 08:11:19

这里是纯参数替换，一个空字符串。注意，我只将一些字母和更多字母定义为字符。如果它们是字母数字，这将无法正常工作。

filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345

2015-10-26 12:22:56

bash解决方案:

IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'

这将破坏一个名为x的变量。var x可以被更改为var _。

input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"

2016-01-22 05:45:24

我的答案将对你想从字符串中得到什么有更多的控制。下面是如何从字符串中提取12345的代码

str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str

如果你想提取像abc这样的字符或像_或-这样的特殊字符，这样会更有效。例如:如果你的字符串是这样的，你想要someletters_之后和_moreleters之前的所有内容。ext:

str="someletters_123-45-24a&13b-1_moreleters.ext"

使用我的代码，您可以确切地说出您想要什么。解释:

#*它将删除前面的字符串，包括匹配的键。这里我们提到的键是_ 它将删除以下字符串，包括匹配的键。这里我们提到的键是_more*

自己做一些实验，你会发现这很有趣。

2016-07-29 07:41:26

给定test.txt文件包含"ABCDEFGHIJKLMNOPQRSTUVWXYZ"

cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST

2016-08-14 19:44:45

我喜欢sed处理正则表达式组的能力:

> var="someletters_12345_moreletters.ext"
> digits=$( echo "$var" | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

一个更一般的选择是不要假设你用下划线_标记你的数字序列的开始，因此例如剥离你在你的序列之前得到的所有非数字:s/[^0-9]\+$[0-9]\+$.*/\1/p。

> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

更多关于这一点，以防你对regexp不太自信:

S代表_s_substitute [0-9]+匹配1+数字 \1链接到正则表达式输出的组n.1(组0是整个匹配，组1是括号内的匹配) P标志为_p_printing

所有转义\都是为了使sed的regexp处理工作。

2016-10-21 08:12:04

Inklusive end，类似于JS和Java实现。删除+1如果你不想这样做。

function substring() {
    local str="$1" start="${2}" end="${3}"
    
    if [[ "$start" == "" ]]; then start="0"; fi
    if [[ "$end"   == "" ]]; then end="${#str}"; fi
    
    local length="((${end}-${start}+1))"
    
    echo "${str:${start}:${length}}"
}

例子:

    substring 01234 0
    01234
    substring 012345 0
    012345
    substring 012345 0 0
    0
    substring 012345 1 1
    1
    substring 012345 1 2
    12
    substring 012345 0 1
    01
    substring 012345 0 2
    012
    substring 012345 0 3
    0123
    substring 012345 0 4
    01234
    substring 012345 0 5
    012345

更多示例调用:

    substring 012345 0
    012345
    substring 012345 1
    12345
    substring 012345 2
    2345
    substring 012345 3
    345
    substring 012345 4
    45
    substring 012345 5
    5
    substring 012345 6
    
    substring 012345 3 5
    345
    substring 012345 3 4
    34
    substring 012345 2 4
    234
    substring 012345 1 3
    123

2019-12-01 13:53:21

外壳切割-从字符串中打印特定范围的字符或给定部分

#method1)使用bash

 str=2020-08-08T07:40:00.000Z
 echo ${str:11:8}

#方法2)使用cut

 str=2020-08-08T07:40:00.000Z
 cut -c12-19 <<< $str

#method3)当使用awk时

 str=2020-08-08T07:40:00.000Z
 awk '{time=gensub(/.{11}(.{8}).*/,"\\1","g",$1); print time}' <<< $str

2020-08-08 09:08:11

也许这可以帮助你得到想要的输出

代码:

your_number=$(echo "someletters_12345_moreleters.ext" | grep -E -o '[0-9]{5}')
echo $your_number

输出:

2021-10-10 16:04:32

这是一个substring.sh文件

使用

`substring.sh $TEXT 2 3` # characters 2-3

`substring.sh $TEXT 2` # characters 2 and after

Substring.sh遵循这一行

#echo "starting substring"
chars=$1
start=$(($2))
end=$3

i=0
o=""
if [[ -z $end ]]; then
  end=`echo "$chars " | wc -c`
else
  end=$((end))
fi
#echo "length is " $e
a=`echo $chars | sed  's/\(.\)/\1 /g'`
#echo "a is " $a
for c in $a
do
  #echo "substring" $i $e $c
  if [[ i -lt $start ]]; then
    : # DO Nothing
  elif [[ i -gt $end ]]; then
    break;
  else
    o="$o$c"
  fi
  i=$(($i+1))
done
#echo substring returning $o
echo $o

2021-11-18 20:15:17

使用sed replace的简单方法:

result=$(echo "someletters_12345_moreleters.ext" | sed 's/.*_\(.*\)_.*/\1/g')
echo $result

2022-03-24 12:17:40

很多过时的解决方案都需要管道和子外壳。自bash版本3(2004年发布)以来，它有一个内置的正则表达式比较操作符=~。

input="someletters_12345_moreleters.ext"
# match: underscore followed by 1 or more digits followed by underscore
[[ $input =~ _([0-9]+)_ ]]
echo ${BASH_REMATCH[1]}

输出:

注意，如果您不是很精通编写RegExp，我建议您阅读精通正则表达式。

如果您只是需要弄清楚如何让RegExp工作，并且它不符合您的想法，请尝试RegEx101.com的在线GUI，并将“Flavor”设置为“PCRE”，以便获得bash使用的[[:digit:]]等POSIX风格的字符类。

2023-01-19 07:02:08

在Bash中提取子字符串

推荐文章

最新文章

标签