在Java中,似乎有不同的方式来读取和写入文件数据。
我想从一个文件读取ASCII数据。可能的方式和它们的区别是什么?
在Java中,似乎有不同的方式来读取和写入文件数据。
我想从一个文件读取ASCII数据。可能的方式和它们的区别是什么?
当前回答
我必须以不同的方式进行基准测试。我将对我的发现进行评论,但简而言之,最快的方法是在FileInputStream上使用普通的BufferedInputStream。如果必须读取许多文件,那么三个线程将使总执行时间减少大约一半,但是增加更多的线程将逐渐降低性能,直到使用20个线程完成的时间比仅使用一个线程完成的时间长3倍。
假设您必须读取文件并对其内容执行一些有意义的操作。在示例中,我们从日志中读取行,并计算包含超过一定阈值的行。因此,我假设一行Java 8 Files.lines(Paths.get("/path/to/file.txt"))。Map (line -> line.split(";"))不是一个选项。
我在Java 1.8、Windows 7以及SSD和HDD驱动器上进行了测试。
我写了六个不同的实现:
rawParse:在FileInputStream上使用BufferedInputStream,然后一个字节一个字节地读取切行。这优于任何其他单线程方法,但对于非ascii文件可能非常不方便。
lineReaderParse:在FileReader上使用BufferedReader,逐行读取,通过调用String.split()分割行。这比rawParse慢了大约20%。
linereaderparsepar列:这与lineReaderParse相同,但它使用几个线程。在所有情况下,这是最快的选择。
nioFilesParse:使用java.nio.files.Files.lines()
nioAsyncParse:使用带有完成处理程序和线程池的AsynchronousFileChannel。
nioMemoryMappedParse:使用内存映射文件。这确实是一个糟糕的想法,导致执行时间比任何其他实现都要长至少三倍。
这是在四核i7和SSD驱动器上读取204个4mb文件的平均时间。这些文件是动态生成的,以避免磁盘缓存。
rawParse 11.10 sec
lineReaderParse 13.86 sec
lineReaderParseParallel 6.00 sec
nioFilesParse 13.52 sec
nioAsyncParse 16.06 sec
nioMemoryMappedParse 37.68 sec
我发现在SSD或HDD驱动器上运行的差异比我预期的要小,SSD大约快15%。这可能是因为文件是在一个未分片的硬盘上生成的,并且它们是按顺序读取的,因此旋转驱动器可以像SSD一样执行。
我对nioAsyncParse实现的低性能感到惊讶。要么是我以错误的方式实现了某些东西,要么是使用NIO和完成处理程序的多线程实现与使用java的单线程实现执行相同(甚至更差)。io API。此外,使用CompletionHandler的异步解析代码行要比在旧流上直接实现的代码行长得多,而且要正确实现也比较棘手。
现在,这六个实现后面跟着一个包含它们的类,再加上一个可参数化的main()方法,该方法允许处理文件数量、文件大小和并发度。注意,文件的大小变化为正负20%。这是为了避免由于所有文件大小完全相同而造成的任何影响。
rawParse
public void rawParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
overrunCount = 0;
final int dl = (int) ';';
StringBuffer lineBuffer = new StringBuffer(1024);
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileInputStream fin = new FileInputStream(fl);
BufferedInputStream bin = new BufferedInputStream(fin);
int character;
while((character=bin.read())!=-1) {
if (character==dl) {
// Here is where something is done with each line
doSomethingWithRawLine(lineBuffer.toString());
lineBuffer.setLength(0);
}
else {
lineBuffer.append((char) character);
}
}
bin.close();
fin.close();
}
}
public final void doSomethingWithRawLine(String line) throws ParseException {
// What to do for each line
int fieldNumber = 0;
final int len = line.length();
StringBuffer fieldBuffer = new StringBuffer(256);
for (int charPos=0; charPos<len; charPos++) {
char c = line.charAt(charPos);
if (c==DL0) {
String fieldValue = fieldBuffer.toString();
if (fieldValue.length()>0) {
switch (fieldNumber) {
case 0:
Date dt = fmt.parse(fieldValue);
fieldNumber++;
break;
case 1:
double d = Double.parseDouble(fieldValue);
fieldNumber++;
break;
case 2:
int t = Integer.parseInt(fieldValue);
fieldNumber++;
break;
case 3:
if (fieldValue.equals("overrun"))
overrunCount++;
break;
}
}
fieldBuffer.setLength(0);
}
else {
fieldBuffer.append(c);
}
}
}
lineReaderParse
public void lineReaderParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
String line;
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null)
doSomethingWithLine(line);
brd.close();
frd.close();
}
}
public final void doSomethingWithLine(String line) throws ParseException {
// Example of what to do for each line
String[] fields = line.split(";");
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
lineReaderParseParallel
public void lineReaderParseParallel(final String targetDir, final int numberOfFiles, final int degreeOfParalelism) throws IOException, ParseException, InterruptedException {
Thread[] pool = new Thread[degreeOfParalelism];
int batchSize = numberOfFiles / degreeOfParalelism;
for (int b=0; b<degreeOfParalelism; b++) {
pool[b] = new LineReaderParseThread(targetDir, b*batchSize, b*batchSize+b*batchSize);
pool[b].start();
}
for (int b=0; b<degreeOfParalelism; b++)
pool[b].join();
}
class LineReaderParseThread extends Thread {
private String targetDir;
private int fileFrom;
private int fileTo;
private DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private int overrunCounter = 0;
public LineReaderParseThread(String targetDir, int fileFrom, int fileTo) {
this.targetDir = targetDir;
this.fileFrom = fileFrom;
this.fileTo = fileTo;
}
private void doSomethingWithTheLine(String line) throws ParseException {
String[] fields = line.split(DL);
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCounter++;
}
@Override
public void run() {
String line;
for (int f=fileFrom; f<fileTo; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
try {
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null) {
doSomethingWithTheLine(line);
}
brd.close();
frd.close();
} catch (IOException | ParseException ioe) { }
}
}
}
nioFilesParse
public void nioFilesParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
for (int f=0; f<numberOfFiles; f++) {
Path ph = Paths.get(targetDir+filenamePreffix+String.valueOf(f)+".txt");
Consumer<String> action = new LineConsumer();
Stream<String> lines = Files.lines(ph);
lines.forEach(action);
lines.close();
}
}
class LineConsumer implements Consumer<String> {
@Override
public void accept(String line) {
// What to do for each line
String[] fields = line.split(DL);
if (fields.length>1) {
try {
Date dt = fmt.parse(fields[0]);
}
catch (ParseException e) {
}
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
}
}
nioAsyncParse
public void nioAsyncParse(final String targetDir, final int numberOfFiles, final int numberOfThreads, final int bufferSize) throws IOException, ParseException, InterruptedException {
ScheduledThreadPoolExecutor pool = new ScheduledThreadPoolExecutor(numberOfThreads);
ConcurrentLinkedQueue<ByteBuffer> byteBuffers = new ConcurrentLinkedQueue<ByteBuffer>();
for (int b=0; b<numberOfThreads; b++)
byteBuffers.add(ByteBuffer.allocate(bufferSize));
for (int f=0; f<numberOfFiles; f++) {
consumerThreads.acquire();
String fileName = targetDir+filenamePreffix+String.valueOf(f)+".txt";
AsynchronousFileChannel channel = AsynchronousFileChannel.open(Paths.get(fileName), EnumSet.of(StandardOpenOption.READ), pool);
BufferConsumer consumer = new BufferConsumer(byteBuffers, fileName, bufferSize);
channel.read(consumer.buffer(), 0l, channel, consumer);
}
consumerThreads.acquire(numberOfThreads);
}
class BufferConsumer implements CompletionHandler<Integer, AsynchronousFileChannel> {
private ConcurrentLinkedQueue<ByteBuffer> buffers;
private ByteBuffer bytes;
private String file;
private StringBuffer chars;
private int limit;
private long position;
private DateFormat frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
public BufferConsumer(ConcurrentLinkedQueue<ByteBuffer> byteBuffers, String fileName, int bufferSize) {
buffers = byteBuffers;
bytes = buffers.poll();
if (bytes==null)
bytes = ByteBuffer.allocate(bufferSize);
file = fileName;
chars = new StringBuffer(bufferSize);
frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
limit = bufferSize;
position = 0l;
}
public ByteBuffer buffer() {
return bytes;
}
@Override
public synchronized void completed(Integer result, AsynchronousFileChannel channel) {
if (result!=-1) {
bytes.flip();
final int len = bytes.limit();
int i = 0;
try {
for (i = 0; i < len; i++) {
byte by = bytes.get();
if (by=='\n') {
// ***
// The code used to process the line goes here
chars.setLength(0);
}
else {
chars.append((char) by);
}
}
}
catch (Exception x) {
System.out.println(
"Caught exception " + x.getClass().getName() + " " + x.getMessage() +
" i=" + String.valueOf(i) + ", limit=" + String.valueOf(len) +
", position="+String.valueOf(position));
}
if (len==limit) {
bytes.clear();
position += len;
channel.read(bytes, position, channel, this);
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
@Override
public void failed(Throwable e, AsynchronousFileChannel channel) {
}
};
所有案例的完全可运行实现
https://github.com/sergiomt/javaiobenchmark/blob/master/FileReadBenchmark.java
其他回答
我必须以不同的方式进行基准测试。我将对我的发现进行评论,但简而言之,最快的方法是在FileInputStream上使用普通的BufferedInputStream。如果必须读取许多文件,那么三个线程将使总执行时间减少大约一半,但是增加更多的线程将逐渐降低性能,直到使用20个线程完成的时间比仅使用一个线程完成的时间长3倍。
假设您必须读取文件并对其内容执行一些有意义的操作。在示例中,我们从日志中读取行,并计算包含超过一定阈值的行。因此,我假设一行Java 8 Files.lines(Paths.get("/path/to/file.txt"))。Map (line -> line.split(";"))不是一个选项。
我在Java 1.8、Windows 7以及SSD和HDD驱动器上进行了测试。
我写了六个不同的实现:
rawParse:在FileInputStream上使用BufferedInputStream,然后一个字节一个字节地读取切行。这优于任何其他单线程方法,但对于非ascii文件可能非常不方便。
lineReaderParse:在FileReader上使用BufferedReader,逐行读取,通过调用String.split()分割行。这比rawParse慢了大约20%。
linereaderparsepar列:这与lineReaderParse相同,但它使用几个线程。在所有情况下,这是最快的选择。
nioFilesParse:使用java.nio.files.Files.lines()
nioAsyncParse:使用带有完成处理程序和线程池的AsynchronousFileChannel。
nioMemoryMappedParse:使用内存映射文件。这确实是一个糟糕的想法,导致执行时间比任何其他实现都要长至少三倍。
这是在四核i7和SSD驱动器上读取204个4mb文件的平均时间。这些文件是动态生成的,以避免磁盘缓存。
rawParse 11.10 sec
lineReaderParse 13.86 sec
lineReaderParseParallel 6.00 sec
nioFilesParse 13.52 sec
nioAsyncParse 16.06 sec
nioMemoryMappedParse 37.68 sec
我发现在SSD或HDD驱动器上运行的差异比我预期的要小,SSD大约快15%。这可能是因为文件是在一个未分片的硬盘上生成的,并且它们是按顺序读取的,因此旋转驱动器可以像SSD一样执行。
我对nioAsyncParse实现的低性能感到惊讶。要么是我以错误的方式实现了某些东西,要么是使用NIO和完成处理程序的多线程实现与使用java的单线程实现执行相同(甚至更差)。io API。此外,使用CompletionHandler的异步解析代码行要比在旧流上直接实现的代码行长得多,而且要正确实现也比较棘手。
现在,这六个实现后面跟着一个包含它们的类,再加上一个可参数化的main()方法,该方法允许处理文件数量、文件大小和并发度。注意,文件的大小变化为正负20%。这是为了避免由于所有文件大小完全相同而造成的任何影响。
rawParse
public void rawParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
overrunCount = 0;
final int dl = (int) ';';
StringBuffer lineBuffer = new StringBuffer(1024);
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileInputStream fin = new FileInputStream(fl);
BufferedInputStream bin = new BufferedInputStream(fin);
int character;
while((character=bin.read())!=-1) {
if (character==dl) {
// Here is where something is done with each line
doSomethingWithRawLine(lineBuffer.toString());
lineBuffer.setLength(0);
}
else {
lineBuffer.append((char) character);
}
}
bin.close();
fin.close();
}
}
public final void doSomethingWithRawLine(String line) throws ParseException {
// What to do for each line
int fieldNumber = 0;
final int len = line.length();
StringBuffer fieldBuffer = new StringBuffer(256);
for (int charPos=0; charPos<len; charPos++) {
char c = line.charAt(charPos);
if (c==DL0) {
String fieldValue = fieldBuffer.toString();
if (fieldValue.length()>0) {
switch (fieldNumber) {
case 0:
Date dt = fmt.parse(fieldValue);
fieldNumber++;
break;
case 1:
double d = Double.parseDouble(fieldValue);
fieldNumber++;
break;
case 2:
int t = Integer.parseInt(fieldValue);
fieldNumber++;
break;
case 3:
if (fieldValue.equals("overrun"))
overrunCount++;
break;
}
}
fieldBuffer.setLength(0);
}
else {
fieldBuffer.append(c);
}
}
}
lineReaderParse
public void lineReaderParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
String line;
for (int f=0; f<numberOfFiles; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null)
doSomethingWithLine(line);
brd.close();
frd.close();
}
}
public final void doSomethingWithLine(String line) throws ParseException {
// Example of what to do for each line
String[] fields = line.split(";");
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
lineReaderParseParallel
public void lineReaderParseParallel(final String targetDir, final int numberOfFiles, final int degreeOfParalelism) throws IOException, ParseException, InterruptedException {
Thread[] pool = new Thread[degreeOfParalelism];
int batchSize = numberOfFiles / degreeOfParalelism;
for (int b=0; b<degreeOfParalelism; b++) {
pool[b] = new LineReaderParseThread(targetDir, b*batchSize, b*batchSize+b*batchSize);
pool[b].start();
}
for (int b=0; b<degreeOfParalelism; b++)
pool[b].join();
}
class LineReaderParseThread extends Thread {
private String targetDir;
private int fileFrom;
private int fileTo;
private DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private int overrunCounter = 0;
public LineReaderParseThread(String targetDir, int fileFrom, int fileTo) {
this.targetDir = targetDir;
this.fileFrom = fileFrom;
this.fileTo = fileTo;
}
private void doSomethingWithTheLine(String line) throws ParseException {
String[] fields = line.split(DL);
Date dt = fmt.parse(fields[0]);
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCounter++;
}
@Override
public void run() {
String line;
for (int f=fileFrom; f<fileTo; f++) {
File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
try {
FileReader frd = new FileReader(fl);
BufferedReader brd = new BufferedReader(frd);
while ((line=brd.readLine())!=null) {
doSomethingWithTheLine(line);
}
brd.close();
frd.close();
} catch (IOException | ParseException ioe) { }
}
}
}
nioFilesParse
public void nioFilesParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
for (int f=0; f<numberOfFiles; f++) {
Path ph = Paths.get(targetDir+filenamePreffix+String.valueOf(f)+".txt");
Consumer<String> action = new LineConsumer();
Stream<String> lines = Files.lines(ph);
lines.forEach(action);
lines.close();
}
}
class LineConsumer implements Consumer<String> {
@Override
public void accept(String line) {
// What to do for each line
String[] fields = line.split(DL);
if (fields.length>1) {
try {
Date dt = fmt.parse(fields[0]);
}
catch (ParseException e) {
}
double d = Double.parseDouble(fields[1]);
int t = Integer.parseInt(fields[2]);
if (fields[3].equals("overrun"))
overrunCount++;
}
}
}
nioAsyncParse
public void nioAsyncParse(final String targetDir, final int numberOfFiles, final int numberOfThreads, final int bufferSize) throws IOException, ParseException, InterruptedException {
ScheduledThreadPoolExecutor pool = new ScheduledThreadPoolExecutor(numberOfThreads);
ConcurrentLinkedQueue<ByteBuffer> byteBuffers = new ConcurrentLinkedQueue<ByteBuffer>();
for (int b=0; b<numberOfThreads; b++)
byteBuffers.add(ByteBuffer.allocate(bufferSize));
for (int f=0; f<numberOfFiles; f++) {
consumerThreads.acquire();
String fileName = targetDir+filenamePreffix+String.valueOf(f)+".txt";
AsynchronousFileChannel channel = AsynchronousFileChannel.open(Paths.get(fileName), EnumSet.of(StandardOpenOption.READ), pool);
BufferConsumer consumer = new BufferConsumer(byteBuffers, fileName, bufferSize);
channel.read(consumer.buffer(), 0l, channel, consumer);
}
consumerThreads.acquire(numberOfThreads);
}
class BufferConsumer implements CompletionHandler<Integer, AsynchronousFileChannel> {
private ConcurrentLinkedQueue<ByteBuffer> buffers;
private ByteBuffer bytes;
private String file;
private StringBuffer chars;
private int limit;
private long position;
private DateFormat frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
public BufferConsumer(ConcurrentLinkedQueue<ByteBuffer> byteBuffers, String fileName, int bufferSize) {
buffers = byteBuffers;
bytes = buffers.poll();
if (bytes==null)
bytes = ByteBuffer.allocate(bufferSize);
file = fileName;
chars = new StringBuffer(bufferSize);
frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
limit = bufferSize;
position = 0l;
}
public ByteBuffer buffer() {
return bytes;
}
@Override
public synchronized void completed(Integer result, AsynchronousFileChannel channel) {
if (result!=-1) {
bytes.flip();
final int len = bytes.limit();
int i = 0;
try {
for (i = 0; i < len; i++) {
byte by = bytes.get();
if (by=='\n') {
// ***
// The code used to process the line goes here
chars.setLength(0);
}
else {
chars.append((char) by);
}
}
}
catch (Exception x) {
System.out.println(
"Caught exception " + x.getClass().getName() + " " + x.getMessage() +
" i=" + String.valueOf(i) + ", limit=" + String.valueOf(len) +
", position="+String.valueOf(position));
}
if (len==limit) {
bytes.clear();
position += len;
channel.read(bytes, position, channel, this);
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
else {
try {
channel.close();
}
catch (IOException e) {
}
consumerThreads.release();
bytes.clear();
buffers.add(bytes);
}
}
@Override
public void failed(Throwable e, AsynchronousFileChannel channel) {
}
};
所有案例的完全可运行实现
https://github.com/sergiomt/javaiobenchmark/blob/master/FileReadBenchmark.java
以下是三种工作和测试的方法:
使用BufferedReader
package io;
import java.io.*;
public class ReadFromFile2 {
public static void main(String[] args)throws Exception {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while((st=br.readLine()) != null){
System.out.println(st);
}
}
}
使用扫描仪
package io;
import java.io.File;
import java.util.Scanner;
public class ReadFromFileUsingScanner {
public static void main(String[] args) throws Exception {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
Scanner sc = new Scanner(file);
while(sc.hasNextLine()){
System.out.println(sc.nextLine());
}
}
}
使用FileReader
package io;
import java.io.*;
public class ReadingFromFile {
public static void main(String[] args) throws Exception {
FileReader fr = new FileReader("C:\\Users\\pankaj\\Desktop\\test.java");
int i;
while ((i=fr.read()) != -1){
System.out.print((char) i);
}
}
}
使用Scanner类读取整个文件,而不使用循环
package io;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ReadingEntireFileWithoutLoop {
public static void main(String[] args) throws FileNotFoundException {
File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
Scanner sc = new Scanner(file);
sc.useDelimiter("\\Z");
System.out.println(sc.next());
}
}
我编写的这段代码对于非常大的文件要快得多:
public String readDoc(File f) {
String text = "";
int read, N = 1024 * 1024;
char[] buffer = new char[N];
try {
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
while(true) {
read = br.read(buffer, 0, N);
text += new String(buffer, 0, read);
if(read < N) {
break;
}
}
} catch(Exception ex) {
ex.printStackTrace();
}
return text;
}
在实践中,缓冲流类的性能要高得多,以至于NIO.2 API包含了专门返回这些流类的方法,部分原因是为了鼓励您始终在应用程序中使用缓冲流。
这里有一个例子:
Path path = Paths.get("/myfolder/myfile.ext");
try (BufferedReader reader = Files.newBufferedReader(path)) {
// Read from the stream
String currentLine = null;
while ((currentLine = reader.readLine()) != null)
//do your code here
} catch (IOException e) {
// Handle file I/O exception...
}
您可以替换此代码
BufferedReader reader = Files.newBufferedReader(path);
与
BufferedReader br = new BufferedReader(new FileReader("/myfolder/myfile.ext"));
我推荐这篇文章来学习Java NIO和IO的主要用途。
org.apache.commons.io.FileUtils中的方法也可能非常方便,例如:
/**
* Reads the contents of a file line by line to a List
* of Strings using the default encoding for the VM.
*/
static List readLines(File file)