我使用巨大的数据文件,有时我只需要知道这些文件中的行数,通常我打开它们,一行一行地读取它们,直到我到达文件的末尾

我在想有没有更聪明的办法


当前回答

/**
 * Count file rows.
 *
 * @param file file
 * @return file row count
 * @throws IOException
 */
public static long getLineCount(File file) throws IOException {

    try (Stream<String> lines = Files.lines(file.toPath())) {
        return lines.count();
    }
}

在JDK8_u31上测试。但与此方法相比,性能确实较慢:

/**
 * Count file rows.
 *
 * @param file file
 * @return file row count
 * @throws IOException
 */
public static long getLineCount(File file) throws IOException {

    try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {

        byte[] c = new byte[1024];
        boolean empty = true,
                lastEmpty = false;
        long count = 0;
        int read;
        while ((read = is.read(c)) != -1) {
            for (int i = 0; i < read; i++) {
                if (c[i] == '\n') {
                    count++;
                    lastEmpty = true;
                } else if (lastEmpty) {
                    lastEmpty = false;
                }
            }
            empty = false;
        }

        if (!empty) {
            if (count == 0) {
                count = 1;
            } else if (!lastEmpty) {
                count++;
            }
        }

        return count;
    }
}

经过测试,非常快。

其他回答

我的结论是wc -l:s计算换行的方法是好的,但是在最后一行不以换行符结束的文件上返回非直观的结果。

和@。基于LineNumberReader的vikas解决方案,但在行数中添加一个,在最后一行以换行符结束的文件上返回非直观的结果。

因此我做了一个算法,处理如下:

@Test
public void empty() throws IOException {
    assertEquals(0, count(""));
}

@Test
public void singleNewline() throws IOException {
    assertEquals(1, count("\n"));
}

@Test
public void dataWithoutNewline() throws IOException {
    assertEquals(1, count("one"));
}

@Test
public void oneCompleteLine() throws IOException {
    assertEquals(1, count("one\n"));
}

@Test
public void twoCompleteLines() throws IOException {
    assertEquals(2, count("one\ntwo\n"));
}

@Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
    assertEquals(2, count("one\ntwo"));
}

@Test
public void aFewLines() throws IOException {
    assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}

它是这样的:

static long countLines(InputStream is) throws IOException {
    try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
        char[] buf = new char[8192];
        int n, previousN = -1;
        //Read will return at least one byte, no need to buffer more
        while((n = lnr.read(buf)) != -1) {
            previousN = n;
        }
        int ln = lnr.getLineNumber();
        if (previousN == -1) {
            //No data read at all, i.e file was empty
            return 0;
        } else {
            char lastChar = buf[previousN - 1];
            if (lastChar == '\n' || lastChar == '\r') {
                //Ending with newline, deduct one
                return ln;
            }
        }
        //normal case, return line number + 1
        return ln + 1;
    }
}

如果你想要直观的结果,你可以用这个。如果您只想要wc -l兼容性,只需使用@er即可。Vikas解决方案,但不添加一个到结果,并重试跳过:

try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
    while(lnr.skip(Long.MAX_VALUE) > 0){};
    return lnr.getLineNumber();
}

一个直接的方式使用扫描器

static void lineCounter (String path) throws IOException {

        int lineCount = 0, commentsCount = 0;

        Scanner input = new Scanner(new File(path));
        while (input.hasNextLine()) {
            String data = input.nextLine();

            if (data.startsWith("//")) commentsCount++;

            lineCount++;
        }

        System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
    }

在基于unix的系统上,在命令行上使用wc命令。

在java-8中,你可以使用流:

try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
  long numOfLines = lines.count();
  ...
}

我已经实现了这个问题的另一个解决方案,我发现它在计算行数时更有效:

try
(
   FileReader       input = new FileReader("input.txt");
   LineNumberReader count = new LineNumberReader(input);
)
{
   while (count.skip(Long.MAX_VALUE) > 0)
   {
      // Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
   }

   result = count.getLineNumber() + 1;                                    // +1 because line index starts at 0
}