我如何能逐行读取大文本文件，而不将它们加载到内存?

我想逐行读取一个大文件(>5GB)，而不将其全部内容加载到内存中。我不能使用readlines()，因为它在内存中创建了一个非常大的列表。

当前回答

你所需要做的就是使用file对象作为迭代器。

for line in open("log.txt"):
    do_something_with(line)

在最近的Python版本中使用上下文管理器更好。

with open("log.txt") as fileobject:
    for line in fileobject:
        do_something_with(line)

这也会自动关闭文件。

2011-06-25 02:07:39

其他回答

你所需要做的就是使用file对象作为迭代器。

for line in open("log.txt"):
    do_something_with(line)

在最近的Python版本中使用上下文管理器更好。

with open("log.txt") as fileobject:
    for line in fileobject:
        do_something_with(line)

这也会自动关闭文件。

2011-06-25 02:07:39

这个怎么样? 将文件划分为块，然后逐行读取，因为当您读取文件时，操作系统将缓存下一行。如果逐行读取文件，则不能有效利用缓存的信息。

相反，将文件划分为块，并将整个块加载到内存中，然后进行处理。

def chunks(file,size=1024):
    while 1:

        startat=fh.tell()
        print startat #file's object current position from the start
        fh.seek(size,1) #offset from current postion -->1
        data=fh.readline()
        yield startat,fh.tell()-startat #doesnt store whole list in memory
        if not data:
            break
if os.path.isfile(fname):
    try:
        fh=open(fname,'rb') 
    except IOError as e: #file --> permission denied
        print "I/O error({0}): {1}".format(e.errno, e.strerror)
    except Exception as e1: #handle other exceptions such as attribute errors
        print "Unexpected error: {0}".format(e1)
    for ele in chunks(fh):
        fh.seek(ele[0])#startat
        data=fh.read(ele[1])#endat
        print data

2017-10-25 00:30:20

请试试这个:

with open('filename','r',buffering=100000) as f:
    for line in f:
        print line

2018-01-25 14:48:49

最好使用迭代器。相关:fileinput -迭代多个输入流中的行。

从文档中可以看出:

import fileinput
for line in fileinput.input("filename", encoding="utf-8"):
    process(line)

这将避免将整个文件一次复制到内存中。

2011-06-25 02:06:16

这是我找到的最佳解决方案，我在330 MB的文件上尝试了一下。

lineno = 500
line_length = 8
with open('catfour.txt', 'r') as file:
    file.seek(lineno * (line_length + 2))
    print(file.readline(), end='')

其中line_length是单行中的字符数。例如，“abcd”的行长为4。

我添加了2个行长来跳过'\n'字符并移动到下一个字符。

2020-05-02 12:46:16

我如何能逐行读取大文本文件，而不将它们加载到内存?

推荐文章

最新文章

标签