我使用for循环来读取文件,但我只想读取特定的行,比如第26行和第30行。是否有任何内置功能来实现这一点?


当前回答

我更喜欢这种方法,因为它更通用,即你可以在文件上使用它,在f.r edlines()的结果上,在StringIO对象上,无论什么:

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']

其他回答

不要使用阅读线!

我的解决方案是:


with open(filename) as f:
    specify = [26, 30]
    results = list(
        map(lambda line: line[1],
            filter(lambda line: line[0] in specify,
                   enumerate(f))
            )
    )

对6.5G文件进行如下测试:

import time

filename = 'a.txt'
start = time.time()
with open(filename, 'w') as f:
    for i in range(10_000_000):
        f.write(f'{str(i)*100}\n')       
end1 = time.time()

with open(filename) as f:
    specify = [26, 30]
    results = list(
        map(lambda line: line[1],
            filter(lambda line: line[0] in specify,
                   enumerate(f))
            )
    )
end2 = time.time()
print(f'write time: {end1-start}')
print(f'read time: {end2-end1}')
# write time: 14.38945460319519
# read time: 8.380386352539062

为了完整起见,这里还有一个选项。

让我们从python文档中的定义开始:

通常包含序列的一部分的对象。slice使用下标符号[]创建,当给出几个数字时,数字之间使用冒号,例如variable_name[1:3:5]。括号(下标)表示法在内部使用切片对象(或在旧版本中使用__getslice__()和__setslice__())。

虽然slice表示法一般不直接适用于迭代器,但itertools包包含一个替换函数:

from itertools import islice

# print the 100th line
with open('the_file') as lines:
    for line in islice(lines, 99, 100):
        print line

# print each third line until 100
with open('the_file') as lines:
    for line in islice(lines, 0, 100, 3):
        print line

该函数的另一个优点是,它直到结束才读取迭代器。所以你可以做更复杂的事情:

with open('the_file') as lines:
    # print the first 100 lines
    for line in islice(lines, 100):
        print line

    # then skip the next 5
    for line in islice(lines, 5):
        pass

    # print the rest
    for line in lines:
        print line

为了回答最初的问题:

# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]
with open("test.txt", "r") as fp:
   lines = fp.readlines()
print(lines[3])

Test.txt是文件名 打印test.txt中的第4行

这是我的2美分,不管它是否值得;)

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])

读取文件非常快。读取一个100MB的文件需要不到0.1秒(请参阅我的文章用Python读写文件)。因此,你应该完整地阅读它,然后处理单行。

大多数回答这里做的不是错,而是风格不好。打开文件应该总是用with,因为它可以确保文件再次关闭。

所以你应该这样做:

with open("path/to/file.txt") as f:
    lines = f.readlines()
print(lines[26])  # or whatever you want to do with this line
print(lines[30])  # or whatever you want to do with this line

巨大的文件

如果你有一个巨大的文件,内存消耗是一个问题,你可以逐行处理它:

with open("path/to/file.txt") as f:
    for i, line in enumerate(f):
        pass  # process line i