我如何能逐行读取大文本文件，而不将它们加载到内存?

我想逐行读取一个大文件(>5GB)，而不将其全部内容加载到内存中。我不能使用readlines()，因为它在内存中创建了一个非常大的列表。

当前回答

如果你在文件中没有换行符，你可以这样做:

with open('large_text.txt') as f:
  while True:
    c = f.read(1024)
    if not c:
      break
    print(c,end='')

2018-05-06 15:20:56

其他回答

如果你在文件中没有换行符，你可以这样做:

with open('large_text.txt') as f:
  while True:
    c = f.read(1024)
    if not c:
      break
    print(c,end='')

2018-05-06 15:20:56

老派方法:

fh = open(file_name, 'rt')
line = fh.readline()
while line:
    # do stuff with line
    line = fh.readline()
fh.close()

2011-06-25 02:31:27

在文件对象上使用for循环逐行读取。使用open(…)让上下文管理器确保文件读取后关闭:

with open("log.txt") as infile:
    for line in infile:
        print(line)

2011-06-25 02:26:20

当您希望并行工作并只读取数据块，但要用新行保持数据整洁时，这可能很有用。

def readInChunks(fileObj, chunkSize=1024):
    while True:
        data = fileObj.read(chunkSize)
        if not data:
            break
        while data[-1:] != '\n':
            data+=fileObj.read(1)
        yield data

2019-05-10 12:00:04

这个怎么样? 将文件划分为块，然后逐行读取，因为当您读取文件时，操作系统将缓存下一行。如果逐行读取文件，则不能有效利用缓存的信息。

相反，将文件划分为块，并将整个块加载到内存中，然后进行处理。

def chunks(file,size=1024):
    while 1:

        startat=fh.tell()
        print startat #file's object current position from the start
        fh.seek(size,1) #offset from current postion -->1
        data=fh.readline()
        yield startat,fh.tell()-startat #doesnt store whole list in memory
        if not data:
            break
if os.path.isfile(fname):
    try:
        fh=open(fname,'rb') 
    except IOError as e: #file --> permission denied
        print "I/O error({0}): {1}".format(e.errno, e.strerror)
    except Exception as e1: #handle other exceptions such as attribute errors
        print "Unexpected error: {0}".format(e1)
    for ele in chunks(fh):
        fh.seek(ele[0])#startat
        data=fh.read(ele[1])#endat
        print data

2017-10-25 00:30:20

我如何能逐行读取大文本文件，而不将它们加载到内存?

推荐文章

最新文章

标签