我使用for循环来读取文件,但我只想读取特定的行,比如第26行和第30行。是否有任何内置功能来实现这一点?
当前回答
我更喜欢这种方法,因为它更通用,即你可以在文件上使用它,在f.r edlines()的结果上,在StringIO对象上,无论什么:
def read_specific_lines(file, lines_to_read):
"""file is any iterable; lines_to_read is an iterable containing int values"""
lines = set(lines_to_read)
last = max(lines)
for n, line in enumerate(file):
if n + 1 in lines:
yield line
if n + 1 > last:
return
>>> with open(r'c:\temp\words.txt') as f:
[s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']
其他回答
如果要读取的文件很大,并且你不想一次在内存中读取整个文件:
fp = open("file")
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
fp.close()
注意第n行i == n-1。
在Python 2.6或更高版本中:
with open("file") as fp:
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
为了完整起见,这里还有一个选项。
让我们从python文档中的定义开始:
通常包含序列的一部分的对象。slice使用下标符号[]创建,当给出几个数字时,数字之间使用冒号,例如variable_name[1:3:5]。括号(下标)表示法在内部使用切片对象(或在旧版本中使用__getslice__()和__setslice__())。
虽然slice表示法一般不直接适用于迭代器,但itertools包包含一个替换函数:
from itertools import islice
# print the 100th line
with open('the_file') as lines:
for line in islice(lines, 99, 100):
print line
# print each third line until 100
with open('the_file') as lines:
for line in islice(lines, 0, 100, 3):
print line
该函数的另一个优点是,它直到结束才读取迭代器。所以你可以做更复杂的事情:
with open('the_file') as lines:
# print the first 100 lines
for line in islice(lines, 100):
print line
# then skip the next 5
for line in islice(lines, 5):
pass
# print the rest
for line in lines:
print line
为了回答最初的问题:
# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]
不要使用阅读线!
我的解决方案是:
with open(filename) as f:
specify = [26, 30]
results = list(
map(lambda line: line[1],
filter(lambda line: line[0] in specify,
enumerate(f))
)
)
对6.5G文件进行如下测试:
import time
filename = 'a.txt'
start = time.time()
with open(filename, 'w') as f:
for i in range(10_000_000):
f.write(f'{str(i)*100}\n')
end1 = time.time()
with open(filename) as f:
specify = [26, 30]
results = list(
map(lambda line: line[1],
filter(lambda line: line[0] in specify,
enumerate(f))
)
)
end2 = time.time()
print(f'write time: {end1-start}')
print(f'read time: {end2-end1}')
# write time: 14.38945460319519
# read time: 8.380386352539062
这是我的2美分,不管它是否值得;)
def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
fp = open(filename, "r")
src = fp.readlines()
data = [(index, line) for index, line in enumerate(src) if index in lines]
fp.close()
return data
# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
print "Line: %s\nData: %s\n" % (line[0], line[1])
def getitems(iterable, items):
items = list(items) # get a list from any iterable and make our own copy
# since we modify it
if items:
items.sort()
for n, v in enumerate(iterable):
if n == items[0]:
yield v
items.pop(0)
if not items:
break
print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item
推荐文章
- 证书验证失败:无法获得本地颁发者证书
- 当使用pip3安装包时,“Python中的ssl模块不可用”
- 无法切换Python与pyenv
- Python if not == vs if !=
- 如何从scikit-learn决策树中提取决策规则?
- 为什么在Mac OS X v10.9 (Mavericks)的终端中apt-get功能不起作用?
- 将旋转的xtick标签与各自的xtick对齐
- 为什么元组可以包含可变项?
- 如何合并字典的字典?
- 如何创建类属性?
- 不区分大小写的“in”
- 在Python中获取迭代器中的元素个数
- 解析日期字符串并更改格式
- 使用try和。Python中的if
- 如何在Python中获得所有直接子目录