x=$(find . -name "*.txt")
echo $x

如果我在Bash shell中运行上面的代码段,我得到的是一个包含几个由空白分隔的文件名的字符串,而不是一个列表。

当然,我可以进一步用空白分隔它们以得到一个列表,但我相信有更好的方法来做到这一点。

那么,循环查找命令结果的最佳方法是什么呢?


当前回答

Filenames can include spaces and even control characters. Spaces are (default) delimiters for shell expansion in bash and as a result of that x=$(find . -name "*.txt") from the question is not recommended at all. If find gets a filename with spaces e.g. "the file.txt" you will get 2 separated strings for processing, if you process x in a loop. You can improve this by changing delimiter (bash IFS Variable) e.g. to \r\n, but filenames can include control characters - so this is not a (completely) safe method.

从我的角度来看,有两种推荐的(安全的)文件处理模式:

1. 用于循环和文件名扩展:

for file in ./*.txt; do
    [[ ! -e $file ]] && continue  # continue, if file does not exist
    # single filename is in $file
    echo "$file"
    # your code here
done

2. 使用find-read-while & process替换

while IFS= read -r -d '' file; do
    # single filename is in $file
    echo "$file"
    # your code here
done < <(find . -name "*.txt" -print0)

讲话

模式1:

bash returns the search pattern ("*.txt") if no matching file is found - so the extra line "continue, if file does not exist" is needed. see Bash Manual, Filename Expansion shell option nullglob can be used to avoid this extra line. "If the failglob shell option is set, and no matches are found, an error message is printed and the command is not executed." (from Bash Manual above) shell option globstar: "If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match." see Bash Manual, Shopt Builtin other options for filename expansion: extglob, nocaseglob, dotglob & shell variable GLOBIGNORE

模式二:

filenames can contain blanks, tabs, spaces, newlines, ... to process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL. see also Gnu Findutils Manpage, Unsafe File Name Handling, safe File Name Handling, unusual characters in filenames. See David A. Wheeler below for detailed discussion of this topic. There are some possible patterns to process find results in a while loop. Others (kevin, David W.) have shown how to do this using pipes: files_found=1 find . -name "*.txt" -print0 | while IFS= read -r -d '' file; do # single filename in $file echo "$file" files_found=0 # not working example # your code here done [[ $files_found -eq 0 ]] && echo "files found" || echo "no files found" When you try this piece of code, you will see, that it does not work: files_found is always "true" & the code will always echo "no files found". Reason is: each command of a pipeline is executed in a separate subshell, so the changed variable inside the loop (separate subshell) does not change the variable in the main shell script. This is why I recommend using process substitution as the "better", more useful, more general pattern.See I set variables in a loop that's in a pipeline. Why do they disappear... (from Greg's Bash FAQ) for a detailed discussion on this topic.

其他参考资料和来源:

Gnu Bash手册,模式匹配 《Shell中的文件名和路径名:如何正确地做到这一点》,David A. Wheeler 格雷格的维基,你为什么不看带“for”的行 为什么不应该解析Greg的Wiki ls(1)的输出 Gnu Bash手册,进程替换

其他回答

另一种选择是不使用bash,而是调用Python来完成繁重的工作。我反复使用这个方法是因为bash解决方案作为我的另一个答案太慢了。

使用这个解决方案,我们从内联Python脚本构建一个bash文件数组:

#!/bin/bash
set -eu -o pipefail

dsep=":"  # directory_separator
base_directory=/tmp

all_files=()
all_files_string="$(python3 -c '#!/usr/bin/env python3
import os
import sys

dsep="'"$dsep"'"
base_directory="'"$base_directory"'"

def log(*args, **kwargs):
    print(*args, file=sys.stderr, **kwargs)

def check_invalid_characther(file_path):
    for thing in ("\\", "\n"):
        if thing in file_path:
            raise RuntimeError(f"It is not allowed {thing} on \"{file_path}\"!")
def absolute_path_to_relative(base_directory, file_path):
    relative_path = os.path.commonprefix( [ base_directory, file_path ] )
    relative_path = os.path.normpath( file_path.replace( relative_path, "" ) )

    # if you use Windows Python, it accepts / instead of \\
    # if you have \ on your files names, rename them or comment this
    relative_path = relative_path.replace("\\", "/")
    if relative_path.startswith( "/" ):
        relative_path = relative_path[1:]
    return relative_path

for directory, directories, files in os.walk(base_directory):
    for file in files:
        local_file_path = os.path.join(directory, file)
        local_file_name = absolute_path_to_relative(base_directory, local_file_path)

        log(f"local_file_name {local_file_name}.")
        check_invalid_characther(local_file_name)
        print(f"{base_directory}{dsep}{local_file_name}")
' | dos2unix)";
if [[ -n "$all_files_string" ]];
then
    readarray -t temp <<< "$all_files_string";
    all_files+=("${temp[@]}");
fi;

for item in "${all_files[@]}";
do
    OLD_IFS="$IFS"; IFS="$dsep";
    read -r base_directory local_file_name <<< "$item"; IFS="$OLD_IFS";

    printf 'item "%s", base_directory "%s", local_file_name "%s".\n' \
            "$item" \
            "$base_directory" \
            "$local_file_name";
done;

相关:

操作系统。不用隐藏文件夹行走 如何做一个递归子文件夹搜索和返回文件在一个列表? 如何在Bash中将字符串分割成数组?

你可以存储你的查找输出在数组中,如果你希望以后使用输出:

array=($(find . -name "*.txt"))

现在要打印new line中的每个元素,可以使用for循环迭代数组的所有元素,也可以使用printf语句。

for i in ${array[@]};do echo $i; done

or

printf '%s\n' "${array[@]}"

你还可以使用:

for file in "`find . -name "*.txt"`"; do echo "$file"; done

这将以换行符打印每个文件名

若要仅以列表形式打印查找输出,可以使用以下方法之一:

find . -name "*.txt" -print 2>/dev/null

or

find . -name "*.txt" -print | grep -v 'Permission denied'

这将删除错误消息,并仅在新行中输出文件名。

如果您希望对文件名做一些事情,将其存储在数组中是很好的,否则不需要占用该空间,您可以直接从find输出。

正如Kevin已经在上面的答案中发布的那样,最好的解决方案是使用bash glob的for循环,但由于bash glob默认情况下不是递归的,这可以通过bash递归函数来修复:

#!/bin/bash
set -x
set -eu -o pipefail

all_files=();

function get_all_the_files()
{
    directory="$1";
    for item in "$directory"/* "$directory"/.[^.]*;
    do
        if [[ -d "$item" ]];
        then
            get_all_the_files "$item";
        else
            all_files+=("$item");
        fi;
    done;
}

get_all_the_files "/tmp";

for file_path in "${all_files[@]}"
do
    printf 'My file is "%s"\n' "$file_path";
done;

相关问题:

Bash loop through directory including hidden file Recursively list files from a given directory in Bash ls command: how can I get a recursive full-path listing, one line per file? List files recursively in Linux CLI with path relative to the current directory Recursively List all directories and files bash script, create array of all files in a directory How can I creates array that contains the names of all the files in a folder? How can I creates array that contains the names of all the files in a folder? How to get the list of files in a directory in a shell script?

如果你可以假设文件名不包含换行符,你可以使用以下命令将find的输出读入Bash数组:

readarray -t x < <(find . -name '*.txt')

注意:

-t导致readarray删除换行符。 如果readarray在管道中,它将不起作用,因此需要进行进程替换。 readarray从Bash 4开始就可用了。

Bash 4.4及更高版本还支持-d参数来指定分隔符。使用空字符而不是换行符来分隔文件名也适用于文件名包含换行符的罕见情况:

readarray -d '' x < <(find . -name '*.txt' -print0)

Readarray也可以作为具有相同选项的mapfile调用。

参考:https://mywiki.wooledge.org/BashFAQ/005 # Loading_lines_from_a_file_or_stream

无论你做什么,都不要使用for循环:

# Don't do this
for file in $(find . -name "*.txt")
do
    …code using "$file"
done

三个原因:

为了使For循环开始,find必须运行到完成。 如果一个文件名中有任何空格(包括空格、制表符或换行符),它将被视为两个独立的名称。 虽然现在不太可能,但您可以溢出命令行缓冲区。想象一下,如果您的命令行缓冲区有32KB,而for循环返回40KB的文本。最后的8KB将从for循环中删除,而您永远不会知道。


总是使用while read结构:

find . -name "*.txt" -print0 | while read -d $'\0' file
do
    …code using "$file"
done

循环将在执行find命令时执行。另外,即使返回的文件名中有空格,该命令也可以工作。而且,不会溢出命令行缓冲区。

print0将使用NULL作为文件分隔符而不是换行符,而-d $'\0'将在读取时使用NULL作为分隔符。