如何在Python中列出目录中的所有文件并将其添加到列表中?
当前回答
从目录及其所有子目录获取完整文件路径
import os
def get_filepaths(directory):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),
it yields a 3-tuple (dirpath, dirnames, filenames).
"""
file_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths # Self-explanatory.
# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")
我在上述函数中提供的路径包含3个文件,其中两个位于根目录中,另一个位于名为“subfolder”的子文件夹中打印将打印列表的完整文件路径:['/Users/johnny/Desktop/TEST/file1.txt','/Users/johnny/Desctop/TEST-file2.txt','/Users/johnny/Desktop/STEST/SUBFOLDER/file3.dat']
如果愿意,您可以打开并阅读内容,或者只关注扩展名为“.dat”的文件,如下面的代码所示:
for f in full_file_paths:
if f.endswith(".dat"):
print f
/用户/johnny/Desktop/TEST/SUBFOLDER/file3.dat
其他回答
我将提供一个示例单行,其中可以提供源路径和文件类型作为输入。代码返回带有csv扩展名的文件名列表。使用以防需要返回所有文件。这还将递归扫描子目录。
[y代表os.walk中的x(sourcePath)代表glob中的y(os.path.join(x[0],'*.csv'))]
根据需要修改文件扩展名和源路径。
从目录及其所有子目录获取完整文件路径
import os
def get_filepaths(directory):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),
it yields a 3-tuple (dirpath, dirnames, filenames).
"""
file_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths # Self-explanatory.
# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")
我在上述函数中提供的路径包含3个文件,其中两个位于根目录中,另一个位于名为“subfolder”的子文件夹中打印将打印列表的完整文件路径:['/Users/johnny/Desktop/TEST/file1.txt','/Users/johnny/Desctop/TEST-file2.txt','/Users/johnny/Desktop/STEST/SUBFOLDER/file3.dat']
如果愿意,您可以打开并阅读内容,或者只关注扩展名为“.dat”的文件,如下面的代码所示:
for f in full_file_paths:
if f.endswith(".dat"):
print f
/用户/johnny/Desktop/TEST/SUBFOLDER/file3.dat
dircache是“自2.6版以来已弃用:Python 3.0中已删除dircache模块。”
import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
if len(list[i]) != check:
temp.append(list[i-1])
check = len(list[i])
else:
i = i + 1
count = count - 1
print temp
从3.4版开始,有内置的迭代器,比os.listdir()高效得多:
pathlib:3.4版新增。
>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]
根据PEP428,pathlib库的目的是提供一个简单的类层次结构来处理文件系统路径和用户对它们进行的常见操作。
os.scandir():3.5版新增。
>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]
注意,os.walk()使用os.scandir()而不是3.5版的os.listdir(),根据PEP471,它的速度提高了2-20倍。
让我也推荐阅读ShadowRanger在下面的评论。
对于Python 2:
pip install rglob
那就做吧
import rglob
file_list = rglob.rglob("/home/base/dir/", "*")
print file_list