在Bash脚本中,我希望将一行分割成多个片段,并将它们存储在一个数组中。

例如,给定一行:

Paris, France, Europe

我想让结果数组看起来像这样:

array[0] = Paris
array[1] = France
array[2] = Europe

最好是一个简单的实现;速度并不重要。我该怎么做呢?


当前回答

输入代码here多字符分隔符解决方案。

正如其他人在这篇文章中指出的,OP的问题给出了一个用逗号分隔的字符串被解析成数组的例子,但没有指出他/她是否只对逗号分隔符、单字符分隔符或多字符分隔符感兴趣。

由于谷歌倾向于将这个答案排在搜索结果的顶部或附近,所以我想为读者提供一个关于多个字符分隔符问题的有力答案,因为至少有一个回答也提到了这个问题。

如果您正在寻找多字符分隔符问题的解决方案,我建议您查看Mallikarjun M的帖子,特别是来自gniourf_gniourf的回复 谁提供了这个优雅的纯BASH解决方案使用参数展开:

#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array

链接到引用的评论/引用的帖子

链接到引用的问题:如何在bash中拆分多字符分隔符上的字符串?


2022年8月3日

Xebeche在下面的评论中提出了一个很好的观点。在审查了他们建议的编辑之后,我修改了gniourf_gniourf提供的脚本,并添加了注释,以便于理解脚本正在做什么。我还将双括号[[]]改为单括号,以提高兼容性,因为许多SHell变体不支持双括号表法。在本例中,对于BaSH,逻辑在单括号或双括号内工作。

#!/bin/bash
  
str="LearnABCtoABCSplitABCABCaABCStringABC"
delimiter="ABC"
array=()

while [ "$str" ]; do

    # parse next sub-string, left of next delimiter
    substring="${str%%"$delimiter"*}" 

    # when substring = delimiter, truncate leading delimiter
    # (i.e. pattern is "$delimiter$delimiter")
    [ -z "$substring" ] && str="${str#"$delimiter"}" && continue

    # create next array element with parsed substring
    array+=( "$substring" )

    # remaining string to the right of delimiter becomes next string to be evaluated
    str="${str:${#substring}}"

    # prevent infinite loop when last substring = delimiter
    [ "$str" == "$delimiter" ] && break

done

declare -p array

不加评论:

#!/bin/bash
str="LearnABCtoABCSplitABCABCaABCStringABC"
delimiter="ABC"
array=()
while [ "$str" ]; do
    substring="${str%%"$delimiter"*}" 
    [ -z "$substring" ] && str="${str#"$delimiter"}" && continue
    array+=( "$substring" )
    str="${str:${#substring}}"
    [ "$str" == "$delimiter" ] && break
done
declare -p array

其他回答

下面是一种不设置IFS的方法:

string="1:2:3:4:5"
set -f                      # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
    echo "$i=>${array[i]}"
done

这个想法是使用字符串替换:

${string//substring/replacement}

将所有匹配的$substring替换为空白,然后使用替换后的字符串初始化数组:

(element1 element2 ... elementN)

注意:这个答案使用了split+glob操作符。因此,为了防止某些字符(如*)的扩展,暂停该脚本的通配符是个好主意。

将字符串分割为数组的关键是多字符分隔符“,”。任何使用IFS进行多字符分隔符的解决方案本质上都是错误的,因为IFS是这些字符的集合,而不是字符串。

如果指定IFS=", ","则字符串将在"," OR " "或它们的任何组合上中断,这不是","这两个字符分隔符的准确表示。

你可以使用awk或sed来拆分字符串,使用进程替换:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do   # use a NUL terminated field separator 
    array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output

在Bash中直接使用正则表达式更有效:

#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...

使用第二种形式,没有子外壳,它将固有地更快。


编辑by bgoldst:下面是一些比较我的readarray解决方案和dawg的regex解决方案的基准测试,我还包括了read解决方案(注:我稍微修改了regex解决方案,以使其与我的解决方案更加和谐)(也可以参阅我的帖子下面的评论):

## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\  ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };

## helper functions
function rep {
    local -i i=-1;
    for ((i = 0; i<$1; ++i)); do
        printf %s "$2";
    done;
}; ## end rep()

function testAll {
    local funcs=();
    local args=();
    local func='';
    local -i rc=-1;
    while [[ "$1" != ':' ]]; do
        func="$1";
        if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
            echo "bad function name: $func" >&2;
            return 2;
        fi;
        funcs+=("$func");
        shift;
    done;
    shift;
    args=("$@");
    for func in "${funcs[@]}"; do
        echo -n "$func ";
        { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
        rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
    done| column -ts/;
}; ## end testAll()

function makeStringToSplit {
    local -i n=$1; ## number of fields
    if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
    if [[ $n -eq 0 ]]; then
        echo;
    elif [[ $n -eq 1 ]]; then
        echo 'first field';
    elif [[ "$n" -eq 2 ]]; then
        echo 'first field, last field';
    else
        echo "first field, $(rep $[$1-2] 'mid field, ')last field";
    fi;
}; ## end makeStringToSplit()

function testAll_splitIntoArray {
    local -i n=$1; ## number of fields in input string
    local s='';
    echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
    s="$(makeStringToSplit "$n")";
    testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()

## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.000s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.001s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray   real  0m0.069s   user 0m0.000s   sys  0m0.062s
## c_read        real  0m0.065s   user 0m0.000s   sys  0m0.046s
## c_regex       real  0m0.005s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray   real  0m0.084s   user 0m0.031s   sys  0m0.077s
## c_read        real  0m0.092s   user 0m0.031s   sys  0m0.046s
## c_regex       real  0m0.125s   user 0m0.125s   sys  0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray   real  0m0.209s   user 0m0.093s   sys  0m0.108s
## c_read        real  0m0.333s   user 0m0.234s   sys  0m0.109s
## c_regex       real  0m9.095s   user 0m9.078s   sys  0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray   real  0m1.460s   user 0m0.326s   sys  0m1.124s
## c_read        real  0m2.780s   user 0m1.686s   sys  0m1.092s
## c_regex       real  17m38.208s   user 15m16.359s   sys  2m19.375s
##

对于多行元素,为什么不像

$ array=($(echo -e $'a a\nb b' | tr ' ' '§')) && array=("${array[@]//§/ }") && echo "${array[@]/%/ INTERELEMENT}"

a a INTERELEMENT b b INTERELEMENT
#!/bin/bash

string="a | b c"
pattern=' | '

# replaces pattern with newlines
splitted="$(sed "s/$pattern/\n/g" <<< "$string")"

# Reads lines and put them in array
readarray -t array2 <<< "$splitted"

# Prints number of elements
echo ${#array2[@]}
# Prints all elements
for a in "${array2[@]}"; do
        echo "> '$a'"
done

此解决方案适用于较大的分隔符(多个字符)。 如果在原始字符串中已经有换行符,则不工作

这是我的破解方法!

使用bash拆分字符串是一件非常无聊的事情。实际情况是,我们有有限的方法,只能在少数情况下工作(被“;”,“/”,“.”等等分开),或者我们在输出中有各种副作用。

下面的方法需要一些操作,但我相信它可以满足我们的大部分需求!

#!/bin/bash

# --------------------------------------
# SPLIT FUNCTION
# ----------------

F_SPLIT_R=()
f_split() {
    : 'It does a "split" into a given string and returns an array.

    Args:
        TARGET_P (str): Target string to "split".
        DELIMITER_P (Optional[str]): Delimiter used to "split". If not 
    informed the split will be done by spaces.

    Returns:
        F_SPLIT_R (array): Array with the provided string separated by the 
    informed delimiter.
    '

    F_SPLIT_R=()
    TARGET_P=$1
    DELIMITER_P=$2
    if [ -z "$DELIMITER_P" ] ; then
        DELIMITER_P=" "
    fi

    REMOVE_N=1
    if [ "$DELIMITER_P" == "\n" ] ; then
        REMOVE_N=0
    fi

    # NOTE: This was the only parameter that has been a problem so far! 
    # By Questor
    # [Ref.: https://unix.stackexchange.com/a/390732/61742]
    if [ "$DELIMITER_P" == "./" ] ; then
        DELIMITER_P="[.]/"
    fi

    if [ ${REMOVE_N} -eq 1 ] ; then

        # NOTE: Due to bash limitations we have some problems getting the 
        # output of a split by awk inside an array and so we need to use 
        # "line break" (\n) to succeed. Seen this, we remove the line breaks 
        # momentarily afterwards we reintegrate them. The problem is that if 
        # there is a line break in the "string" informed, this line break will 
        # be lost, that is, it is erroneously removed in the output! 
        # By Questor
        TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")

    fi

    # NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results 
    # in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the 
    # amount of "\n" that there was originally in the string (one more 
    # occurrence at the end of the string)! We can not explain the reason for 
    # this side effect. The line below corrects this problem! By Questor
    TARGET_P=${TARGET_P%????????????????????????????????}

    SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")

    while IFS= read -r LINE_NOW ; do
        if [ ${REMOVE_N} -eq 1 ] ; then

            # NOTE: We use "'" to prevent blank lines with no other characters 
            # in the sequence being erroneously removed! We do not know the 
            # reason for this side effect! By Questor
            LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")

            # NOTE: We use the commands below to revert the intervention made 
            # immediately above! By Questor
            LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
            LN_NOW_WITH_N=${LN_NOW_WITH_N#?}

            F_SPLIT_R+=("$LN_NOW_WITH_N")
        else
            F_SPLIT_R+=("$LINE_NOW")
        fi
    done <<< "$SPLIT_NOW"
}

# --------------------------------------
# HOW TO USE
# ----------------

STRING_TO_SPLIT="
 * How do I list all databases and tables using psql?

\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"

\"
\list or \l: list all databases
\dt: list all tables in the current database
\"

[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]


"

f_split "$STRING_TO_SPLIT" "bin/psql -c"

# --------------------------------------
# OUTPUT AND TEST
# ----------------

ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
    echo " > -----------------------------------------"
    echo "${F_SPLIT_R[$i]}"
    echo " < -----------------------------------------"
done

if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
    echo " > -----------------------------------------"
    echo "The strings are the same!"
    echo " < -----------------------------------------"
fi