在Bash脚本中,我希望将一行分割成多个片段,并将它们存储在一个数组中。

例如,给定一行:

Paris, France, Europe

我想让结果数组看起来像这样:

array[0] = Paris
array[1] = France
array[2] = Europe

最好是一个简单的实现;速度并不重要。我该怎么做呢?


当前回答

另一种不修改IFS的方法是:

read -r -a myarray <<< "${string//, /$IFS}"

我们不需要更改IFS以匹配所需的分隔符,而是可以通过"${string//, /$IFS}"将所有出现的所需分隔符","替换为$IFS的内容。

也许这对于非常大的字符串来说会很慢?

这是基于Dennis Williamson的回答。

其他回答

将字符串分割为数组的关键是多字符分隔符“,”。任何使用IFS进行多字符分隔符的解决方案本质上都是错误的,因为IFS是这些字符的集合,而不是字符串。

如果指定IFS=", ","则字符串将在"," OR " "或它们的任何组合上中断,这不是","这两个字符分隔符的准确表示。

你可以使用awk或sed来拆分字符串,使用进程替换:

#!/bin/bash

str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do   # use a NUL terminated field separator 
    array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output

在Bash中直接使用正则表达式更有效:

#!/bin/bash

str="Paris, France, Europe"

array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
    array+=("${BASH_REMATCH[1]}")   # capture the field
    i=${#BASH_REMATCH}              # length of field + delimiter
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...

使用第二种形式,没有子外壳,它将固有地更快。


编辑by bgoldst:下面是一些比较我的readarray解决方案和dawg的regex解决方案的基准测试,我还包括了read解决方案(注:我稍微修改了regex解决方案,以使其与我的解决方案更加和谐)(也可以参阅我的帖子下面的评论):

## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\  ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };

## helper functions
function rep {
    local -i i=-1;
    for ((i = 0; i<$1; ++i)); do
        printf %s "$2";
    done;
}; ## end rep()

function testAll {
    local funcs=();
    local args=();
    local func='';
    local -i rc=-1;
    while [[ "$1" != ':' ]]; do
        func="$1";
        if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
            echo "bad function name: $func" >&2;
            return 2;
        fi;
        funcs+=("$func");
        shift;
    done;
    shift;
    args=("$@");
    for func in "${funcs[@]}"; do
        echo -n "$func ";
        { time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
        rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
    done| column -ts/;
}; ## end testAll()

function makeStringToSplit {
    local -i n=$1; ## number of fields
    if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
    if [[ $n -eq 0 ]]; then
        echo;
    elif [[ $n -eq 1 ]]; then
        echo 'first field';
    elif [[ "$n" -eq 2 ]]; then
        echo 'first field, last field';
    else
        echo "first field, $(rep $[$1-2] 'mid field, ')last field";
    fi;
}; ## end makeStringToSplit()

function testAll_splitIntoArray {
    local -i n=$1; ## number of fields in input string
    local s='';
    echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
    s="$(makeStringToSplit "$n")";
    testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()

## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.000s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray   real  0m0.067s   user 0m0.000s   sys  0m0.000s
## c_read        real  0m0.064s   user 0m0.000s   sys  0m0.000s
## c_regex       real  0m0.001s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray   real  0m0.069s   user 0m0.000s   sys  0m0.062s
## c_read        real  0m0.065s   user 0m0.000s   sys  0m0.046s
## c_regex       real  0m0.005s   user 0m0.000s   sys  0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray   real  0m0.084s   user 0m0.031s   sys  0m0.077s
## c_read        real  0m0.092s   user 0m0.031s   sys  0m0.046s
## c_regex       real  0m0.125s   user 0m0.125s   sys  0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray   real  0m0.209s   user 0m0.093s   sys  0m0.108s
## c_read        real  0m0.333s   user 0m0.234s   sys  0m0.109s
## c_regex       real  0m9.095s   user 0m9.078s   sys  0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray   real  0m1.460s   user 0m0.326s   sys  0m1.124s
## c_read        real  0m2.780s   user 0m1.686s   sys  0m1.092s
## c_regex       real  17m38.208s   user 15m16.359s   sys  2m19.375s
##

这个问题的所有答案或多或少都是错误的。


错误答案1

IFS=', ' read -r -a array <<< "$string"

1:这是对$IFS的滥用。$IFS变量的值不作为单个变长字符串分隔符,而是作为一组单字符字符串分隔符,其中读取的每个字段从输入行分离出来,可以用该集合中的任何字符结束(本例中为逗号或空格)。

实际上,对于那些真正坚持的人来说,$IFS的全部含义要稍微复杂一些。来自bash手册:

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters <space>, <tab>, and <newline> are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

基本上,对于$IFS的非默认非空值,字段可以用(1)来自“IFS空白字符”集的一个或多个字符序列(即<空格>,<tab>和<换行>(“换行”表示换行(LF))中的任何一个字符分隔,或者(2)任何出现在$IFS中的非“IFS空白字符”,以及输入行中围绕它的任何“IFS空白字符”。

对于OP,我在前一段中描述的第二种分离模式可能正是他想要的输入字符串,但我们可以相当肯定的是,我描述的第一种分离模式根本不正确。例如,如果他的输入字符串是'Los Angeles, United States, North America'呢?

IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")

2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following space or other baggage), if the value of the $string variable happens to contain any LFs, then read will stop processing once it encounters the first LF. The read builtin only processes one line per invocation. This is true even if you are piping or redirecting input only to the read statement, as we are doing in this example with the here-string mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the read builtin has no knowledge of the data flow within its containing command structure.

你可能会说这不太可能造成问题,但这仍然是一个微妙的危险,如果可能的话应该避免。这是由于内建的read实际上进行了两级输入分割:首先分解为行,然后分解为字段。由于OP只需要一个级别的分割,因此read内置的这种用法是不合适的,我们应该避免它。

3:这个解决方案的一个不明显的潜在问题是,如果后面的字段是空的,read总是丢弃它,尽管在其他情况下它保留空字段。下面是一个演示:

string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")

也许OP并不关心这一点,但这仍然是一个值得了解的限制。它降低了解决方案的健壮性和通用性。

这个问题可以通过在输入字符串供读取之前附加一个虚拟的尾随分隔符来解决,我将在后面演示。


错误答案2

string="1:2:3:4:5"
set -f                     # avoid globbing (expansion of *).
array=(${string//:/ })

类似的想法:

t="one,two,three"
a=($(echo $t | tr ',' "\n"))

(注意:我在回答者似乎遗漏的命令替换周围添加了缺失的括号。)

类似的想法:

string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)

这些解决方案利用数组赋值中的字分割将字符串分割为字段。有趣的是,就像read一样,一般的分词也使用$IFS特殊变量,尽管在这种情况下,它暗示它被设置为其默认值<空格><制表符><换行>,因此任何一个或多个IFS字符序列(现在都是空格字符)都被认为是字段分隔符。

This solves the problem of two levels of splitting committed by read, since word splitting by itself constitutes only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already contain $IFS characters, and thus they would be improperly split during the word splitting operation. This happens to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America' (or 'Los Angeles:United States:North America').

Also, word splitting is normally followed by filename expansion (aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing the characters *, ?, or [ followed by ] (and, if extglob is set, parenthesized fragments preceded by ?, *, +, @, or !) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers has cleverly undercut this problem by running set -f beforehand to disable globbing. Technically this works (although you should probably add set +f afterward to reenable globbing for subsequent code which may depend on it), but it's undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.

这个答案的另一个问题是所有空字段都会丢失。这可能是问题,也可能不是问题,这取决于应用程序。

Note: If you're going to use this solution, it's better to use the ${string//:/ } "pattern substitution" form of parameter expansion, rather than going to the trouble of invoking a command substitution (which forks the shell), starting up a pipeline, and running an external executable (tr or sed), since parameter expansion is purely a shell-internal operation. (Also, for the tr and sed solutions, the input variable should be double-quoted inside the command substitution; otherwise word splitting would take effect in the echo command and potentially mess with the field values. Also, the $(...) form of command substitution is preferable to the old `...` form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)


错误答案3

str="a, b, c, d"  # assuming there is a space after ',' as in Q
arr=(${str//,/})  # delete all occurrences of ','

这个答案和第二条几乎一样。区别在于应答者假设字段由两个字符分隔,其中一个字符在默认的$IFS中表示,另一个则不是。他通过使用模式替换展开删除非ifs表示的字符,然后使用单词拆分拆分幸存的ifs表示的分隔符字符上的字段,解决了这个相当具体的情况。

这不是一个非常通用的解决方案。此外,可以认为逗号实际上是这里的“主要”分隔符,而剥离它然后依赖空格字符进行字段分割是完全错误的。再一次考虑一下我的反例:“美国,北美的洛杉矶”。

同样,文件名展开可能会破坏展开的单词,但可以通过暂时禁用set -f和set +f的赋值通配符来防止这种情况。

同样,所有空字段都将丢失,这可能是问题,也可能不是问题,这取决于应用程序。


错误答案4

string='first line
second line
third line'

oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"

This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw with previous wrong answers, and there is only one level of splitting, as required.

一个问题是文件名展开会破坏前面描述的受影响的单词,尽管这也可以通过在set -f和set +f中包装关键语句来解决。

另一个潜在的问题是,由于LF符合前面定义的“IFS空白字符”的条件,因此所有空字段都将丢失,就像#2和#3中一样。如果分隔符恰好是非“IFS空白字符”,这当然不是问题,而且取决于应用程序,这可能无关紧要,但它确实破坏了解决方案的通用性。

因此,总的来说,假设您有一个单字符分隔符,并且它不是“IFS空白字符”,或者您不关心空字段,并且您将关键语句包装在set -f和set +f中,那么这个解决方案是可行的,但否则就不行。

(此外,为了方便起见,在bash中将LF分配给一个变量可以更容易地使用$'…'语法,例如IFS=$'\n';.)


错误答案5

countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"

类似的想法:

IFS=', ' eval 'array=($string)'

这个解决方案实际上是#1(因为它将$IFS设置为逗号空格)和#2-4(因为它使用单词分割将字符串分割为字段)之间的交叉。正因为如此,它会遇到上面所有错误答案都会遇到的大多数问题,有点像所有世界中最糟糕的一个。

同样,对于第二种变体,eval调用似乎完全没有必要,因为它的参数是单引号字符串字面量,因此是静态已知的。但实际上,这样使用eval有一个不太明显的好处。通常,当你运行一个简单的命令,它只包含一个变量赋值,这意味着后面没有一个实际的命令字,赋值在shell环境中生效:

IFS=', '; ## changes $IFS in the shell environment

即使简单的命令涉及多个变量赋值也是如此;同样,只要没有命令字,所有变量赋值都会影响shell环境:

IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment

但是,如果变量赋值附加到命令名(我喜欢称之为“前缀赋值”),那么它不会影响shell环境,而是只影响所执行命令的环境,不管它是内置的还是外部的:

IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it

bash手册中的相关引用:

如果没有命令名,变量赋值将影响当前shell环境。否则,这些变量将被添加到所执行命令的环境中,不影响当前的shell环境。

It is possible to exploit this feature of variable assignment to change $IFS only temporarily, which allows us to avoid the whole save-and-restore gambit like that which is being done with the $OIFS variable in the first variant. But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not involve a command word to make the $IFS assignment temporary. You might think to yourself, well why not just add a no-op command word to the statement like the : builtin to make the $IFS assignment temporary? This does not work because it would then make the $array assignment temporary as well:

IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command

所以,我们实际上陷入了僵局,有点左右为难。但是,当eval运行它的代码时,它是在shell环境中运行的,就像它是正常的静态源代码一样,因此我们可以在eval参数中运行$array赋值,使其在shell环境中生效,而作为eval命令前缀的$IFS前缀赋值将不会比eval命令更有效。这正是这个解决方案的第二个变体所使用的技巧:

IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does

因此,正如您所看到的,这实际上是一个相当聪明的技巧,并以一种相当不明显的方式完成了所需的内容(至少在赋值效果方面)。实际上,我并不反对这个技巧,尽管涉及到eval;只是要注意参数字符串使用单引号,以防范安全威胁。

但是,由于“最糟糕的”问题的聚集,这仍然是对OP要求的错误回答。


错误答案6

IFS=', '; array=(Paris, France, Europe)

IFS=' ';declare -a array=(Paris France Europe)

嗯…什么?OP有一个需要解析为数组的字符串变量。这个“答案”以输入字符串的逐字内容粘贴到数组文字开始。我想这是一种方法。

看起来回答者可能假设$IFS变量会影响所有上下文中的所有bash解析,但事实并非如此。来自bash手册:

IFS内部字段分隔符,用于展开后的字分割,并使用read内置命令将行分割为字。默认值为<space><tab><newline>。

因此$IFS特殊变量实际上只在两种情况下使用:(1)在展开之后执行的单词拆分(这意味着在解析bash源代码时不执行)和(2)通过read内置程序将输入行拆分为单词。

Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution. Bash must first parse the source code, which obviously is a parsing event, and then later it executes the code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue with the description of the $IFS variable that I just quoted above; rather than saying that word splitting is performed after expansion, I would say that word splitting is performed during expansion, or, perhaps even more precisely, word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the words "split" and "words" a lot. Here's a relevant excerpt from the linux.die.net version of the bash manual:

在将命令行分解为单词后,在命令行上执行展开操作。执行的展开有7种:大括号展开、波浪号展开、参数和变量展开、命令替换、算术展开、单词拆分和路径名展开。 展开的顺序是:大括号展开;波浪号展开、参数和变量展开、算术展开和命令替换(以从左到右的方式完成);分词;以及路径名展开。

你可能会说GNU版本的手册做得稍微好一点,因为它在扩展部分的第一句中选择了“标记”而不是“单词”:

在将命令行分解为令牌之后,在命令行上执行扩展。

The important point is, $IFS does not change the way bash parses source code. Parsing of bash source code is actually a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule; for example, see the various compatxx shell settings, which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text that was parsed right off the source bytestream.


错误答案7

string='first line
        second line
        third line'

while read -r line; do lines+=("$line"); done <<<"$string"

这是最好的解决办法之一。注意,我们又回到了read。我之前不是说过read是不合适的,因为它执行两层分割,而我们只需要一个?这里的技巧是,您可以以这样一种方式调用read,它可以有效地只执行一级分割,特别是通过每次调用只分离一个字段,这需要在循环中重复调用它。这是一种技巧,但很有效。

But there are problems. First: When you provide at least one NAME argument to read, it automatically ignores leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS is set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case, and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will want this. There is a solution, however: A somewhat non-obvious usage of read is to pass zero NAME arguments. In this case, read will store the entire input line that it gets from the input stream in a variable named $REPLY, and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust usage of read which I've exploited frequently in my shell programming career. Here's a demonstration of the difference in behavior:

string=$'  a  b  \n  c  d  \n  e  f  '; ## input string

a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a  b" [1]="c  d" [2]="e  f") ## read trimmed surrounding whitespace

a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="  a  b  " [1]="  c  d  " [2]="  e  f  ") ## no trimming

该解决方案的第二个问题是,它实际上没有解决自定义字段分隔符的情况,例如OP的逗号空格。与以前一样,不支持多字符分隔符,这是该解决方案的一个不幸的限制。我们可以通过指定-d选项的分隔符来尝试至少在逗号上进行分隔,但是看看会发生什么:

string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")

Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error: Europe is missing! What happened to it? The answer is that read returns a failing return code if it hits end-of-file (in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the while-loop to break prematurely and we lose the final field.

Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken to be LF, which is the default when you don't specify the -d option, and the <<< ("here-string") mechanism automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input. Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter by concatenating it against the input string ourselves when instantiating it in the here-string:

a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

这样,问题解决了。另一种解决方案是仅在(1)read返回失败且(2)$REPLY为空时才打破while循环,这意味着read在击中文件结束符之前无法读取任何字符。演示:

a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<< redirection operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF problem) in one go.

所以,总的来说,这是一个非常强大的解决方案。它唯一的缺点是缺乏对多字符分隔符的支持,我将在后面讨论这个问题。


错误答案8

string='first line
        second line
        third line'

readarray -t lines <<<"$string"

(这实际上来自于第7篇文章;回复者在同一篇文章中提供了两种解决方案。)

内置的readarray(它是mapfile的同义词)是理想的。这是一个内置命令,可以一次性将字节流解析为数组变量;不要混淆循环、条件、替换或其他任何东西。而且它不会偷偷地从输入字符串中删除任何空白。并且(如果没有给出-O)在给目标数组赋值之前,它可以方便地清除目标数组。但它仍然不完美,因此我批评它是一个“错误的答案”。

首先,为了解决这个问题,请注意,就像read在进行字段解析时的行为一样,readarray在尾部字段为空时删除它。同样,这可能不是OP的关注点,但对于某些用例可能是关注点。我一会儿再回到这个问题上来。

其次,与前面一样,它不支持多字符分隔符。我一会儿也会给出一个修复方法。

第三,编写的解决方案不解析OP的输入字符串,事实上,不能按原样使用它来解析OP的输入字符串。我也会在这方面做进一步的阐述。

基于上述原因,我仍然认为这是对OP问题的“错误回答”。下面我将给出我认为正确的答案。


正确的答案

下面是一个naïve的尝试,通过指定-d选项使#8工作:

string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')

我们看到,这个结果与我们在第7章中讨论的循环读取解决方案的双条件方法得到的结果相同。我们几乎可以用手动的假人终结者技巧来解决这个问题:

readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')

这里的问题是readarray保留了尾随字段,因为<<<重定向操作符将LF追加到输入字符串,因此尾随字段不是空的(否则它将被丢弃)。我们可以通过事后显式地取消最终数组元素来解决这个问题:

readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")

剩下的两个问题实际上是相关的:(1)需要修剪的多余空格,以及(2)缺乏对多字符分隔符的支持。

空白当然可以在之后进行修剪(例如,参见如何从Bash变量中修剪空白?)但是,如果我们可以破解一个多字符分隔符,那么这两个问题就可以一次性解决了。

Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed not to collide with the contents of the input string. The only character that has this guarantee is the NUL byte. This is because, in bash (though not in zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution. Here's how to do it using awk:

readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; }' <<<"$string, "); unset 'a[-1]';
declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

在那里,终于!这个解决方案不会在中间错误地分割字段,不会过早地剪切,不会删除空字段,不会在文件名扩展时损坏自身,不会自动剥离前导和尾随空白,不会在末尾留下一个隐藏的LF,不需要循环,也不满足于单字符分隔符。


整理解决方案

最后,我想用readarray的模糊的-C回调选项演示我自己的相当复杂的修剪解决方案。不幸的是,我已经没有空间来对抗Stack Overflow严格的30,000字符的帖子限制,所以我无法解释它。我把这个留给读者做练习。

function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")

更新:不要这样做,由于eval的问题。

不那么讲究礼节的:

IFS=', ' eval 'array=($string)'

e.g.

string="foo, bar,baz"
IFS=', ' eval 'array=($string)'
echo ${array[1]} # -> bar

当我想解析一个输入时,我看到了这篇文章: word1 word2,…

以上这些对我都没有帮助。使用awk解决了这个问题。如果它能帮助某人:

STRING="value1,value2,value3"
array=`echo $STRING | awk -F ',' '{ s = $1; for (i = 2; i <= NF; i++) s = s "\n"$i; print s; }'`
for word in ${array}
do
        echo "This is the word $word"
done

另一种方法是:

string="Paris, France, Europe"
IFS=', ' arr=(${string})

现在你的元素被存储在“arr”数组中。 要遍历元素:

for i in ${arr[@]}; do echo $i; done