如何以最有效的内存和时间方式获取大文件的行数?

def file_len(filename):
    with open(filename) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1

当前回答

另一种可能性:

import subprocess

def num_lines_in_file(fpath):
    return int(subprocess.check_output('wc -l %s' % fpath, shell=True).strip().split()[0])

其他回答

def count_text_file_lines(path):
    with open(path, 'rt') as file:
        line_count = sum(1 for _line in file)
    return line_count

凯尔的回答

num_lines = sum(1 for line in open('my_file.txt'))

最好的替代方案是什么

num_lines =  len(open('my_file.txt').read().splitlines())

这里是两者的性能比较

In [20]: timeit sum(1 for line in open('Charts.ipynb'))
100000 loops, best of 3: 9.79 µs per loop

In [21]: timeit len(open('Charts.ipynb').read().splitlines())
100000 loops, best of 3: 12 µs per loop

大文件的另一种选择是使用xreadlines():

count = 0
for line in open(thefilepath).xreadlines(  ): count += 1

对于Python 3,请参阅:在Python 3中什么替代xreadlines() ?

你可以使用操作系统。路径模块如下所示:

import os
import subprocess
Number_lines = int( (subprocess.Popen( 'wc -l {0}'.format( Filename ), shell=True, stdout=subprocess.PIPE).stdout).readlines()[0].split()[0] )

,其中Filename是文件的绝对路径。

print open('file.txt', 'r').read().count("\n") + 1