Python递归文件夹读取

我有c++ /Obj-C背景，我刚刚发现Python(写了大约一个小时)。我正在写一个脚本递归地读取文件夹结构中的文本文件的内容。

我的问题是，我写的代码将只工作于一个文件夹深度。我可以在代码中看到为什么(见#hardcoded path)，我只是不知道如何使用Python，因为我对它的经验只是全新的。

Python代码:

import os
import sys

rootdir = sys.argv[1]

for root, subFolders, files in os.walk(rootdir):

    for folder in subFolders:
        outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
        folderOut = open( outfileName, 'w' )
        print "outfileName is " + outfileName

        for file in files:
            filePath = rootdir + '/' + file
            f = open( filePath, 'r' )
            toWrite = f.read()
            print "Writing '" + toWrite + "' to" + filePath
            folderOut.write( toWrite )
            f.close()

        folderOut.close()

当前回答

TL;DR:这相当于找到-type f来遍历下面所有文件夹中的所有文件，包括当前文件:

for currentpath, folders, files in os.walk('.'):
    for file in files:
        print(os.path.join(currentpath, file))

正如已经在其他答案中提到的，os.walk()是答案，但它可以更好地解释。这很简单!让我们来看看这棵树:

docs/
└── doc1.odt
pics/
todo.txt

下面的代码:

for currentpath, folders, files in os.walk('.'):
    print(currentpath)

currentpath是它正在查看的当前文件夹。这将输出:

.
./docs
./pics

它循环了三次，因为有三个文件夹:当前文件夹，文档文件夹和图片文件夹。在每个循环中，它用所有文件夹和文件填充变量文件夹和文件。让我们向他们展示:

for currentpath, folders, files in os.walk('.'):
    print(currentpath, folders, files)

这告诉我们:

# currentpath  folders           files
.              ['pics', 'docs']  ['todo.txt']
./pics         []                []
./docs         []                ['doc1.odt']

在第一行中，我们看到我们在。文件夹中，它包含两个文件夹，即pics和docs，还有一个文件，即todo。txt。你不需要做任何事情递归到这些文件夹中，因为如你所见，它会自动递归给你任何子文件夹中的文件。以及它的任何子文件夹(尽管在示例中没有)。

如果你只想遍历所有文件，相当于find -type f，你可以这样做:

for currentpath, folders, files in os.walk('.'):
    for file in files:
        print(os.path.join(currentpath, file))

这个输出:

./todo.txt
./docs/doc1.odt

2019-07-26 15:39:10

其他回答

pathlib库非常适合处理文件。你可以在Path对象上做这样的递归glob。

from pathlib import Path

for elem in Path('/path/to/my/files').rglob('*.*'):
    print(elem)

2019-12-27 01:06:18

确保你理解os.walk的三个返回值:

for root, subdirs, files in os.walk(rootdir):

具有以下含义:

root:被“遍历”的当前路径 subdirs:目录类型根目录下的文件 files:根目录下(不是subdirs目录下)的非directory类型的文件

请使用os.path.join而不是用斜杠连接!您的问题是filePath = rootdir + '/' + file -您必须连接当前“行走”的文件夹，而不是最上面的文件夹。filePath = os。path。加入(根、文件)。顺便说一句，“文件”是内置的，所以你通常不使用它作为变量名。

另一个问题是你的循环，应该是这样的，例如:

import os
import sys

walk_dir = sys.argv[1]

print('walk_dir = ' + walk_dir)

# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))

for root, subdirs, files in os.walk(walk_dir):
    print('--\nroot = ' + root)
    list_file_path = os.path.join(root, 'my-directory-list.txt')
    print('list_file_path = ' + list_file_path)

    with open(list_file_path, 'wb') as list_file:
        for subdir in subdirs:
            print('\t- subdirectory ' + subdir)

        for filename in files:
            file_path = os.path.join(root, filename)

            print('\t- file %s (full path: %s)' % (filename, file_path))

            with open(file_path, 'rb') as f:
                f_content = f.read()
                list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
                list_file.write(f_content)
                list_file.write(b'\n')

如果你不知道，文件的with语句是一种简写:

with open('filename', 'rb') as f:
    dosomething()

# is effectively the same as

f = open('filename', 'rb')
try:
    dosomething()
finally:
    f.close()

2010-02-06 09:48:17

操作系统。Walk默认情况下执行递归遍历。对于每个dir，它从根目录开始生成一个3元组(dirpath, dirnames, filename)

from os import walk
from os.path import splitext, join

def select_files(root, files):
    """
    simple logic here to filter out interesting files
    .py files in this example
    """

    selected_files = []

    for file in files:
        #do concatenation here to get full path 
        full_path = join(root, file)
        ext = splitext(file)[1]

        if ext == ".py":
            selected_files.append(full_path)

    return selected_files

def build_recursive_dir_tree(path):
    """
    path    -    where to begin folder scan
    """
    selected_files = []

    for root, dirs, files in walk(path):
        selected_files += select_files(root, files)

    return selected_files

2011-08-23 13:24:12

如果你想要一个给定目录下的所有路径的平面列表(比如find。在壳中):

   files = [ 
       os.path.join(parent, name)
       for (parent, subdirs, files) in os.walk(YOUR_DIRECTORY)
       for name in files + subdirs
   ]

若要只包含基本目录下文件的完整路径，请省略+ subdirs。

2019-02-05 00:31:14

试试这个:

import os
import sys

for root, subdirs, files in os.walk(path):

    for file in os.listdir(root):

        filePath = os.path.join(root, file)

        if os.path.isdir(filePath):
            pass

        else:
            f = open (filePath, 'r')
            # Do Stuff

2017-07-13 16:46:36

Python递归文件夹读取

推荐文章

最新文章

标签