我有c++ /Obj-C背景,我刚刚发现Python(写了大约一个小时)。
我正在写一个脚本递归地读取文件夹结构中的文本文件的内容。
我的问题是,我写的代码将只工作于一个文件夹深度。我可以在代码中看到为什么(见#hardcoded path),我只是不知道如何使用Python,因为我对它的经验只是全新的。
Python代码:
import os
import sys
rootdir = sys.argv[1]
for root, subFolders, files in os.walk(rootdir):
for folder in subFolders:
outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
folderOut = open( outfileName, 'w' )
print "outfileName is " + outfileName
for file in files:
filePath = rootdir + '/' + file
f = open( filePath, 'r' )
toWrite = f.read()
print "Writing '" + toWrite + "' to" + filePath
folderOut.write( toWrite )
f.close()
folderOut.close()
确保你理解os.walk的三个返回值:
for root, subdirs, files in os.walk(rootdir):
具有以下含义:
root:被“遍历”的当前路径
subdirs:目录类型根目录下的文件
files:根目录下(不是subdirs目录下)的非directory类型的文件
请使用os.path.join而不是用斜杠连接!您的问题是filePath = rootdir + '/' + file -您必须连接当前“行走”的文件夹,而不是最上面的文件夹。filePath = os。path。加入(根、文件)。顺便说一句,“文件”是内置的,所以你通常不使用它作为变量名。
另一个问题是你的循环,应该是这样的,例如:
import os
import sys
walk_dir = sys.argv[1]
print('walk_dir = ' + walk_dir)
# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))
for root, subdirs, files in os.walk(walk_dir):
print('--\nroot = ' + root)
list_file_path = os.path.join(root, 'my-directory-list.txt')
print('list_file_path = ' + list_file_path)
with open(list_file_path, 'wb') as list_file:
for subdir in subdirs:
print('\t- subdirectory ' + subdir)
for filename in files:
file_path = os.path.join(root, filename)
print('\t- file %s (full path: %s)' % (filename, file_path))
with open(file_path, 'rb') as f:
f_content = f.read()
list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
list_file.write(f_content)
list_file.write(b'\n')
如果你不知道,文件的with语句是一种简写:
with open('filename', 'rb') as f:
dosomething()
# is effectively the same as
f = open('filename', 'rb')
try:
dosomething()
finally:
f.close()
操作系统。Walk默认情况下执行递归遍历。对于每个dir,它从根目录开始生成一个3元组(dirpath, dirnames, filename)
from os import walk
from os.path import splitext, join
def select_files(root, files):
"""
simple logic here to filter out interesting files
.py files in this example
"""
selected_files = []
for file in files:
#do concatenation here to get full path
full_path = join(root, file)
ext = splitext(file)[1]
if ext == ".py":
selected_files.append(full_path)
return selected_files
def build_recursive_dir_tree(path):
"""
path - where to begin folder scan
"""
selected_files = []
for root, dirs, files in walk(path):
selected_files += select_files(root, files)
return selected_files
如果你使用的是Python 3.5或更高版本,你可以在一行内完成。
import glob
# root_dir needs a trailing slash (i.e. /root/dir/)
for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
print(filename)
正如文档中提到的
如果递归为true,模式'**'将匹配任何文件以及零个或多个目录和子目录。
如果你想要每个文件,你可以使用
import glob
for filename in glob.iglob(root_dir + '**/**', recursive=True):
print(filename)