Java中文件中的行数

我使用巨大的数据文件，有时我只需要知道这些文件中的行数，通常我打开它们，一行一行地读取它们，直到我到达文件的末尾

我在想有没有更聪明的办法

当前回答

我测试了上面的方法来计数行，这里是我对不同方法的观察，在我的系统上进行了测试

文件大小:1.6 Gb 方法:

使用扫描仪:大约35秒使用BufferedReader:大约5s 使用Java 8: 5s左右使用LineNumberReader:大约5s

此外，Java8方法似乎非常方便:

Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]

2018-11-19 10:57:57

其他回答

在基于unix的系统上，在命令行上使用wc命令。

2009-01-17 09:03:02

我的结论是wc -l:s计算换行的方法是好的，但是在最后一行不以换行符结束的文件上返回非直观的结果。

和@。基于LineNumberReader的vikas解决方案，但在行数中添加一个，在最后一行以换行符结束的文件上返回非直观的结果。

因此我做了一个算法，处理如下:

@Test
public void empty() throws IOException {
    assertEquals(0, count(""));
}

@Test
public void singleNewline() throws IOException {
    assertEquals(1, count("\n"));
}

@Test
public void dataWithoutNewline() throws IOException {
    assertEquals(1, count("one"));
}

@Test
public void oneCompleteLine() throws IOException {
    assertEquals(1, count("one\n"));
}

@Test
public void twoCompleteLines() throws IOException {
    assertEquals(2, count("one\ntwo\n"));
}

@Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
    assertEquals(2, count("one\ntwo"));
}

@Test
public void aFewLines() throws IOException {
    assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}

它是这样的:

static long countLines(InputStream is) throws IOException {
    try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
        char[] buf = new char[8192];
        int n, previousN = -1;
        //Read will return at least one byte, no need to buffer more
        while((n = lnr.read(buf)) != -1) {
            previousN = n;
        }
        int ln = lnr.getLineNumber();
        if (previousN == -1) {
            //No data read at all, i.e file was empty
            return 0;
        } else {
            char lastChar = buf[previousN - 1];
            if (lastChar == '\n' || lastChar == '\r') {
                //Ending with newline, deduct one
                return ln;
            }
        }
        //normal case, return line number + 1
        return ln + 1;
    }
}

如果你想要直观的结果，你可以用这个。如果您只想要wc -l兼容性，只需使用@er即可。Vikas解决方案，但不添加一个到结果，并重试跳过:

try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
    while(lnr.skip(Long.MAX_VALUE) > 0){};
    return lnr.getLineNumber();
}

2016-02-16 14:55:26

我知道这是一个老问题，但公认的解决方案并不完全符合我所需要的。因此，我将其改进为接受各种行结束符(而不仅仅是换行)并使用指定的字符编码(而不是ISO-8859-n)。所有在一个方法(适当重构):

public static long getLinesCount(String fileName, String encodingName) throws IOException {
    long linesCount = 0;
    File file = new File(fileName);
    FileInputStream fileIn = new FileInputStream(file);
    try {
        Charset encoding = Charset.forName(encodingName);
        Reader fileReader = new InputStreamReader(fileIn, encoding);
        int bufferSize = 4096;
        Reader reader = new BufferedReader(fileReader, bufferSize);
        char[] buffer = new char[bufferSize];
        int prevChar = -1;
        int readCount = reader.read(buffer);
        while (readCount != -1) {
            for (int i = 0; i < readCount; i++) {
                int nextChar = buffer[i];
                switch (nextChar) {
                    case '\r': {
                        // The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
                        linesCount++;
                        break;
                    }
                    case '\n': {
                        if (prevChar == '\r') {
                            // The current line is terminated by a carriage return immediately followed by a line feed.
                            // The line has already been counted.
                        } else {
                            // The current line is terminated by a line feed.
                            linesCount++;
                        }
                        break;
                    }
                }
                prevChar = nextChar;
            }
            readCount = reader.read(buffer);
        }
        if (prevCh != -1) {
            switch (prevCh) {
                case '\r':
                case '\n': {
                    // The last line is terminated by a line terminator.
                    // The last line has already been counted.
                    break;
                }
                default: {
                    // The last line is terminated by end-of-file.
                    linesCount++;
                }
            }
        }
    } finally {
        fileIn.close();
    }
    return linesCount;
}

这个解决方案在速度上与公认的解决方案相当，在我的测试中大约慢了4%(尽管Java中的计时测试是出了名的不可靠)。

2012-09-21 20:27:57

我测试了上面的方法来计数行，这里是我对不同方法的观察，在我的系统上进行了测试

文件大小:1.6 Gb 方法:

使用扫描仪:大约35秒使用BufferedReader:大约5s 使用Java 8: 5s左右使用LineNumberReader:大约5s

此外，Java8方法似乎非常方便:

Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]

2018-11-19 10:57:57

一个直接的方式使用扫描器

static void lineCounter (String path) throws IOException {

        int lineCount = 0, commentsCount = 0;

        Scanner input = new Scanner(new File(path));
        while (input.hasNextLine()) {
            String data = input.nextLine();

            if (data.startsWith("//")) commentsCount++;

            lineCount++;
        }

        System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
    }

2014-09-14 03:49:15

Java中文件中的行数

推荐文章

最新文章

标签