如何以最有效的内存和时间方式获取大文件的行数?

def file_len(filename):
    with open(filename) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1

当前回答

我使用的最简单和最短的方法是:

f = open("my_file.txt", "r")
len(f.readlines())

其他回答

我得到了一个小(4-8%)的改进,这个版本重用了一个常量缓冲区,所以它应该避免任何内存或GC开销:

lines = 0
buffer = bytearray(2048)
with open(filename) as f:
  while f.readinto(buffer) > 0:
      lines += buffer.count('\n')

您可以调整缓冲区大小,可能会看到一些改进。

类似的:

lines = 0
with open(path) as f:
    for line in f:
        lines += 1
print open('file.txt', 'r').read().count("\n") + 1

凯尔的回答

num_lines = sum(1 for line in open('my_file.txt'))

最好的替代方案是什么

num_lines =  len(open('my_file.txt').read().splitlines())

这里是两者的性能比较

In [20]: timeit sum(1 for line in open('Charts.ipynb'))
100000 loops, best of 3: 9.79 µs per loop

In [21]: timeit len(open('Charts.ipynb').read().splitlines())
100000 loops, best of 3: 12 µs per loop

一行,可能很快:

num_lines = sum(1 for line in open('myfile.txt'))