这是我所拥有的:
glob(os.path.join('src','*.c'))
但是我想搜索src的子文件夹。这样做是可行的:
glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))
但这显然是有限和笨拙的。
这是我所拥有的:
glob(os.path.join('src','*.c'))
但是我想搜索src的子文件夹。这样做是可行的:
glob(os.path.join('src','*.c'))
glob(os.path.join('src','*','*.c'))
glob(os.path.join('src','*','*','*.c'))
glob(os.path.join('src','*','*','*','*.c'))
但这显然是有限和笨拙的。
当前回答
它使用fnmatch或正则表达式:
import fnmatch, os
def filepaths(directory, pattern):
for root, dirs, files in os.walk(directory):
for basename in files:
try:
matched = pattern.match(basename)
except AttributeError:
matched = fnmatch.fnmatch(basename, pattern)
if matched:
yield os.path.join(root, basename)
# usage
if __name__ == '__main__':
from pprint import pprint as pp
import re
path = r'/Users/hipertracker/app/myapp'
pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
pp([x for x in filepaths(path, '*.py')])
其他回答
另一种方法是只使用glob模块。只需给rglob方法添加一个起始基本目录和一个要匹配的模式,它就会返回一个匹配文件名的列表。
import glob
import os
def _getDirs(base):
return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]
def rglob(base, pattern):
list = []
list.extend(glob.glob(os.path.join(base,pattern)))
dirs = _getDirs(base)
if len(dirs):
for d in dirs:
list.extend(rglob(os.path.join(base,d), pattern))
return list
下面是一个带有嵌套列表推导式的解决方案,os。Walk和简单的后缀匹配代替glob:
import os
cfiles = [os.path.join(root, filename)
for root, dirnames, filenames in os.walk('src')
for filename in filenames if filename.endswith('.c')]
它可以被压缩成一行代码:
import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]
或概括为函数:
import os
def recursive_glob(rootdir='.', suffix=''):
return [os.path.join(looproot, filename)
for looproot, _, filenames in os.walk(rootdir)
for filename in filenames if filename.endswith(suffix)]
cfiles = recursive_glob('src', '.c')
如果您确实需要完整的glob样式模式,您可以遵循Alex的和 Bruno的例子,使用fnmatch:
import fnmatch
import os
def recursive_glob(rootdir='.', pattern='*'):
return [os.path.join(looproot, filename)
for looproot, _, filenames in os.walk(rootdir)
for filename in filenames
if fnmatch.fnmatch(filename, pattern)]
cfiles = recursive_glob('src', '*.c')
下面是我的解决方案,使用列表理解在一个目录和所有子目录中递归地搜索多个文件扩展名:
import os, glob
def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions
Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
with .jpg and .JPG
Parameters
----------
path : str
A directory name
exts : tuple
File extensions to glob for
Returns
-------
files : list
list of files matching extensions in exts in path and subfolders
"""
dirs = [a[0] for a in os.walk(path)]
f_filter = [d+e for d in dirs for e in exts]
return [f for files in [glob.iglob(files) for files in f_filter] for f in files]
my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
print f
从Python 3.4开始,可以在新的pathlib模块中使用Path类之一的glob()方法,它支持**通配符。例如:
from pathlib import Path
for file_path in Path('src').glob('**/*.c'):
print(file_path) # do whatever you need with these files
更新: 从Python 3.5开始,glob.glob()也支持相同的语法。
根据其他答案,这是我目前的工作实现,检索根目录中的嵌套XML文件:
files = []
for root, dirnames, filenames in os.walk(myDir):
files.extend(glob.glob(root + "/*.xml"))
我真的很喜欢python:)