如何以最有效的内存和时间方式获取大文件的行数?

def file_len(filename):
    with open(filename) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1

当前回答

def line_count(path):
    count = 0
    with open(path) as lines:
        for count, l in enumerate(lines, start=1):
            pass
    return count

其他回答

您可以执行子进程并运行wc -l filename

import subprocess

def file_len(fname):
    p = subprocess.Popen(['wc', '-l', fname], stdout=subprocess.PIPE, 
                                              stderr=subprocess.PIPE)
    result, err = p.communicate()
    if p.returncode != 0:
        raise IOError(err)
    return int(result.strip().split()[0])

这个呢

def file_len(fname):
  counts = itertools.count()
  with open(fname) as f: 
    for _ in f: counts.next()
  return counts.next()

这段代码更短、更清晰。这可能是最好的方法:

num_lines = open('yourfile.ext').read().count('\n')

与此答案类似的一行bash解决方案,使用了现代子进程。check_output功能:

def line_count(filename):
    return int(subprocess.check_output(['wc', '-l', filename]).split()[0])

为什么下面的方法行不通呢?

import sys

# input comes from STDIN
file = sys.stdin
data = file.readlines()

# get total number of lines in file
lines = len(data)

print lines

在这种情况下,len函数使用输入行作为确定长度的方法。