如何以最有效的内存和时间方式获取大文件的行数?
def file_len(filename):
with open(filename) as f:
for i, _ in enumerate(f):
pass
return i + 1
如何以最有效的内存和时间方式获取大文件的行数?
def file_len(filename):
with open(filename) as f:
for i, _ in enumerate(f):
pass
return i + 1
当前回答
大文件的另一种选择是使用xreadlines():
count = 0
for line in open(thefilepath).xreadlines( ): count += 1
对于Python 3,请参阅:在Python 3中什么替代xreadlines() ?
其他回答
计数= max(开放(文件))[0]
与此答案类似的一行bash解决方案,使用了现代子进程。check_output功能:
def line_count(filename):
return int(subprocess.check_output(['wc', '-l', filename]).split()[0])
我会使用Python的文件对象方法readlines,如下所示:
with open(input_file) as foo:
lines = len(foo.readlines())
这将打开文件,在文件中创建一个行列表,计算列表的长度,将其保存到一个变量中,然后再次关闭文件。
这是对其他一些答案的元评论。
The line-reading and buffered \n-counting techniques won't return the same answer for every file, because some text files have no newline at the end of the last line. You can work around this by checking the last byte of the last nonempty buffer and adding 1 if it's not b'\n'. In Python 3, opening the file in text mode and in binary mode can yield different results, because text mode by default recognizes CR, LF, and CRLF as line endings (converting them all to '\n'), while in binary mode only LF and CRLF will be counted if you count b'\n'. This applies whether you read by lines or into a fixed-size buffer. The classic Mac OS used CR as a line ending; I don't know how common those files are these days. The buffer-reading approach uses a bounded amount of RAM independent of file size, while the line-reading approach could read the entire file into RAM at once in the worst case (especially if the file uses CR line endings). In the worst case it may use substantially more RAM than the file size, because of overhead from dynamic resizing of the line buffer and (if you opened in text mode) Unicode decoding and storage. You can improve the memory usage, and probably the speed, of the buffered approach by pre-allocating a bytearray and using readinto instead of read. One of the existing answers (with few votes) does this, but it's buggy (it double-counts some bytes). The top buffer-reading answer uses a large buffer (1 MiB). Using a smaller buffer can actually be faster because of OS readahead. If you read 32K or 64K at a time, the OS will probably start reading the next 32K/64K into the cache before you ask for it, and each trip to the kernel will return almost immediately. If you read 1 MiB at a time, the OS is unlikely to speculatively read a whole megabyte. It may preread a smaller amount but you will still spend a significant amount of time sitting in the kernel waiting for the disk to return the rest of the data.
我发现你可以。
f = open("data.txt")
linecout = len(f.readlines())
会给你答案吗