Python递归文件夹读取

我有c++ /Obj-C背景，我刚刚发现Python(写了大约一个小时)。我正在写一个脚本递归地读取文件夹结构中的文本文件的内容。

我的问题是，我写的代码将只工作于一个文件夹深度。我可以在代码中看到为什么(见#hardcoded path)，我只是不知道如何使用Python，因为我对它的经验只是全新的。

Python代码:

import os
import sys

rootdir = sys.argv[1]

for root, subFolders, files in os.walk(rootdir):

    for folder in subFolders:
        outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
        folderOut = open( outfileName, 'w' )
        print "outfileName is " + outfileName

        for file in files:
            filePath = rootdir + '/' + file
            f = open( filePath, 'r' )
            toWrite = f.read()
            print "Writing '" + toWrite + "' to" + filePath
            folderOut.write( toWrite )
            f.close()

        folderOut.close()

当前回答

我认为问题在于你没有处理os的输出。正确的走路。

首先,改变:

filePath = rootdir + '/' + file

to:

filePath = root + '/' + file

Rootdir是固定的起始目录;Root是os.walk返回的目录。

其次，您不需要缩进您的文件处理循环，因为对每个子目录运行这个没有意义。您将获得每个子目录的根集。您不需要手动处理子目录，除非您想对目录本身做一些事情。

2010-02-06 09:34:45

其他回答

如果你使用的是Python 3.5或更高版本，你可以在一行内完成。

import glob

# root_dir needs a trailing slash (i.e. /root/dir/)
for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
     print(filename)

正如文档中提到的

如果递归为true，模式'**'将匹配任何文件以及零个或多个目录和子目录。

如果你想要每个文件，你可以使用

import glob

for filename in glob.iglob(root_dir + '**/**', recursive=True):
     print(filename)

2017-07-18 16:26:06

import glob
import os

root_dir = <root_dir_here>

for filename in glob.iglob(root_dir + '**/**', recursive=True):
    if os.path.isfile(filename):
        with open(filename,'r') as file:
            print(file.read())

**/**用于递归地获取所有文件，包括目录。

如果os.path.isfile(filename)用于检查filename变量是文件还是目录，如果是文件，则可以读取该文件。我正在打印文件。

2019-03-16 05:42:37

我认为问题在于你没有处理os的输出。正确的走路。

首先,改变:

filePath = rootdir + '/' + file

to:

filePath = root + '/' + file

Rootdir是固定的起始目录;Root是os.walk返回的目录。

2010-02-06 09:34:45

在我看来，os.walk()有点太复杂和啰嗦了。你可以做接受的答案清洁:

all_files = [str(f) for f in pathlib.Path(dir_path).glob("**/*") if f.is_file()]

with open(outfile, 'wb') as fout:
    for f in all_files:
        with open(f, 'rb') as fin:
            fout.write(fin.read())
            fout.write(b'\n')

2021-09-05 09:13:53

TL;DR:这相当于找到-type f来遍历下面所有文件夹中的所有文件，包括当前文件:

for currentpath, folders, files in os.walk('.'):
    for file in files:
        print(os.path.join(currentpath, file))

正如已经在其他答案中提到的，os.walk()是答案，但它可以更好地解释。这很简单!让我们来看看这棵树:

docs/
└── doc1.odt
pics/
todo.txt

下面的代码:

for currentpath, folders, files in os.walk('.'):
    print(currentpath)

currentpath是它正在查看的当前文件夹。这将输出:

.
./docs
./pics

它循环了三次，因为有三个文件夹:当前文件夹，文档文件夹和图片文件夹。在每个循环中，它用所有文件夹和文件填充变量文件夹和文件。让我们向他们展示:

for currentpath, folders, files in os.walk('.'):
    print(currentpath, folders, files)

这告诉我们:

# currentpath  folders           files
.              ['pics', 'docs']  ['todo.txt']
./pics         []                []
./docs         []                ['doc1.odt']

在第一行中，我们看到我们在。文件夹中，它包含两个文件夹，即pics和docs，还有一个文件，即todo。txt。你不需要做任何事情递归到这些文件夹中，因为如你所见，它会自动递归给你任何子文件夹中的文件。以及它的任何子文件夹(尽管在示例中没有)。

如果你只想遍历所有文件，相当于find -type f，你可以这样做:

for currentpath, folders, files in os.walk('.'):
    for file in files:
        print(os.path.join(currentpath, file))

这个输出:

./todo.txt
./docs/doc1.odt

2019-07-26 15:39:10

Python递归文件夹读取

推荐文章

最新文章

标签