我有一个20个文件名的列表,比如['file1.txt', 'file2.txt',…]。我想写一个Python脚本将这些文件连接到一个新文件中。我可以通过f = open(…)打开每个文件,通过调用f.r edline()逐行读取,并将每行写入新文件。这对我来说似乎不是很“优雅”,尤其是我必须一行一行地读/写的部分。
在Python中是否有更“优雅”的方式来做到这一点?
我有一个20个文件名的列表,比如['file1.txt', 'file2.txt',…]。我想写一个Python脚本将这些文件连接到一个新文件中。我可以通过f = open(…)打开每个文件,通过调用f.r edline()逐行读取,并将每行写入新文件。这对我来说似乎不是很“优雅”,尤其是我必须一行一行地读/写的部分。
在Python中是否有更“优雅”的方式来做到这一点?
当前回答
如果文件不是很大:
with open('newfile.txt','wb') as newf:
for filename in list_of_files:
with open(filename,'rb') as hf:
newf.write(hf.read())
# newf.write('\n\n\n') if you want to introduce
# some blank lines between the contents of the copied files
如果文件太大,不能完全读取并保存在RAM中,则算法必须稍微不同,以固定长度的块读取循环中复制的每个文件,例如使用read(10000)。
其他回答
如果文件不是很大:
with open('newfile.txt','wb') as newf:
for filename in list_of_files:
with open(filename,'rb') as hf:
newf.write(hf.read())
# newf.write('\n\n\n') if you want to introduce
# some blank lines between the contents of the copied files
如果文件太大,不能完全读取并保存在RAM中,则算法必须稍微不同,以固定长度的块读取循环中复制的每个文件,例如使用read(10000)。
这样就行了
对于大文件:
filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
对于小文件:
filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
我还想到了另一个有趣的问题:
filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
outfile.write(line)
遗憾的是,最后一个方法留下了一些打开的文件描述符,无论如何GC都应该处理这些描述符。我只是觉得很有趣
def concatFiles():
path = 'input/'
files = os.listdir(path)
for idx, infile in enumerate(files):
print ("File #" + str(idx) + " " + infile)
concat = ''.join([open(path + f).read() for f in files])
with open("output_concatFile.txt", "w") as fo:
fo.write(path + concat)
if __name__ == "__main__":
concatFiles()
查看File对象的.read()方法:
http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
你可以这样做:
concat = ""
for file in files:
concat += open(file).read()
或者更“优雅”的python方式:
concat = ''.join([open(f).read() for f in files])
根据这篇文章,http://www.skymind.com/~ocrow/python_string/也将是最快的。
如果目录中有很多文件,那么glob2可能是生成文件名列表的更好选择,而不是手工编写它们。
import glob2
filenames = glob2.glob('*.txt') # list of all .txt files in the directory
with open('outfile.txt', 'w') as f:
for file in filenames:
with open(file) as infile:
f.write(infile.read()+'\n')