Java中文件中的行数

我使用巨大的数据文件，有时我只需要知道这些文件中的行数，通常我打开它们，一行一行地读取它们，直到我到达文件的末尾

我在想有没有更聪明的办法

当前回答

接受的答案有一个错误关闭多行文件，不以换行符结束。一个没有换行符的单行文件将返回1，但是一个没有换行符的两行文件也将返回1。下面是解决这个问题的公认解决方案的实现。endsWithoutNewLine检查对于除最终读取外的所有内容都是浪费的，但与整个函数相比，应该是微不足道的时间。

public int count(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean endsWithoutNewLine = false;
        while ((readChars = is.read(c)) != -1) {
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n')
                    ++count;
            }
            endsWithoutNewLine = (c[readChars - 1] != '\n');
        }
        if(endsWithoutNewLine) {
            ++count;
        } 
        return count;
    } finally {
        is.close();
    }
}

2013-01-19 06:11:26

其他回答

如果没有任何索引结构，就无法读取完整的文件。但是您可以通过避免逐行读取并使用正则表达式来匹配所有行结束符来优化它。

2009-01-17 09:36:41

我已经实现了这个问题的另一个解决方案，我发现它在计算行数时更有效:

try
(
   FileReader       input = new FileReader("input.txt");
   LineNumberReader count = new LineNumberReader(input);
)
{
   while (count.skip(Long.MAX_VALUE) > 0)
   {
      // Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
   }

   result = count.getLineNumber() + 1;                                    // +1 because line index starts at 0
}

2011-03-17 16:28:17

这个有趣的解决方案真的很好!

public static int countLines(File input) throws IOException {
    try (InputStream is = new FileInputStream(input)) {
        int count = 1;
        for (int aChar = 0; aChar != -1;aChar = is.read())
            count += aChar == '\n' ? 1 : 0;
        return count;
    }
}

2016-08-30 16:13:49

这是我迄今为止发现的最快的版本，大约比readLines快6倍。对于150MB的日志文件，这需要0.35秒，而在使用readLines()时需要2.40秒。只是为了好玩，linux的wc -l命令需要0.15秒。

public static int countLinesOld(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

编辑，9年半后:我几乎没有java经验，但无论如何，我试图将这段代码与下面的LineNumberReader解决方案进行基准测试，因为没有人这样做让我感到困扰。似乎对于大文件，我的解决方案更快。虽然它似乎需要几次运行，直到优化器做一个像样的工作。我已经玩了一些代码，并产生了一个新版本，始终是最快的:

public static int countLinesNew(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        
        int readChars = is.read(c);
        if (readChars == -1) {
            // bail out if nothing to read
            return 0;
        }
        
        // make it easy for the optimizer to tune this loop
        int count = 0;
        while (readChars == 1024) {
            for (int i=0; i<1024;) {
                if (c[i++] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }
        
        // count remaining characters
        while (readChars != -1) {
            for (int i=0; i<readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }
        
        return count == 0 ? 1 : count;
    } finally {
        is.close();
    }
}

1.3GB文本文件的基准测试结果，y轴以秒为单位。我已经对同一个文件执行了100次运行，并使用System.nanoTime()对每次运行进行了测量。您可以看到countLinesOld有一些异常值，而countLinesNew没有异常值，虽然它只是稍微快一点，但差异在统计上是显著的。LineNumberReader显然更慢。

2009-01-17 09:35:17

扫描与regex:

public int getLineCount() {
    Scanner fileScanner = null;
    int lineCount = 0;
    Pattern lineEndPattern = Pattern.compile("(?m)$");  
    try {
        fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
        while (fileScanner.hasNext()) {
            fileScanner.next();
            ++lineCount;
        }   
    }catch(FileNotFoundException e) {
        e.printStackTrace();
        return lineCount;
    }
    fileScanner.close();
    return lineCount;
}

还没记。

2019-02-26 17:55:47

Java中文件中的行数

推荐文章

最新文章

标签