我们有一个很大的原始数据文件,我们想把它修剪成指定的大小。

如何在python中获取文本文件的前N行?所使用的操作系统对实现有任何影响吗?


当前回答

这适用于Python 2和3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):
        print(line)

其他回答

基于gnibbler的投票结果(2009年11月20日0:27):这个类将head()和tail()方法添加到文件对象。

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('\n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

用法:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)

Python 3:

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

Python 2:

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

下面是另一种方法(Python 2和3都是):

from itertools import islice

with open("datafile") as myfile:
    head = list(islice(myfile, N))
print(head)
#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

这个方法对我很有效

有一个简单的方法来获取前10行:

with open('fileName.txt', mode = 'r') as file:
    list = [line.rstrip('\n') for line in file][:10]
    print(list)

如果您有一个非常大的文件,并假设您希望输出为numpy数组,则使用np。Genfromtxt将冻结您的计算机。以我的经验来看,这样好多了:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array