如何以最有效的内存和时间方式获取大文件的行数?
def file_len(filename):
with open(filename) as f:
for i, _ in enumerate(f):
pass
return i + 1
如何以最有效的内存和时间方式获取大文件的行数?
def file_len(filename):
with open(filename) as f:
for i, _ in enumerate(f):
pass
return i + 1
当前回答
如果文件能放进内存,那么
with open(fname) as f:
count = len(f.read().split(b'\n')) - 1
其他回答
print open('file.txt', 'r').read().count("\n") + 1
这个呢
def file_len(fname):
counts = itertools.count()
with open(fname) as f:
for _ in f: counts.next()
return counts.next()
def line_count(path):
count = 0
with open(path) as lines:
for count, l in enumerate(lines, start=1):
pass
return count
我修改了缓冲区的情况如下:
def CountLines(filename):
f = open(filename)
try:
lines = 1
buf_size = 1024 * 1024
read_f = f.read # loop optimization
buf = read_f(buf_size)
# Empty file
if not buf:
return 0
while buf:
lines += buf.count('\n')
buf = read_f(buf_size)
return lines
finally:
f.close()
现在空文件和最后一行(不带\n)也被计算在内。
与此答案类似的一行bash解决方案,使用了现代子进程。check_output功能:
def line_count(filename):
return int(subprocess.check_output(['wc', '-l', filename]).split()[0])