如果您有java.io.InputStream对象,应该如何处理该对象并生成String?
假设我有一个包含文本数据的InputStream,我想将其转换为String,例如,我可以将其写入日志文件。
获取InputStream并将其转换为字符串的最简单方法是什么?
public String convertStreamToString(InputStream is) {
// ???
}
如果您有java.io.InputStream对象,应该如何处理该对象并生成String?
假设我有一个包含文本数据的InputStream,我想将其转换为String,例如,我可以将其写入日志文件。
获取InputStream并将其转换为字符串的最简单方法是什么?
public String convertStreamToString(InputStream is) {
// ???
}
当前回答
为了完整起见,这里是Java 9解决方案:
public static String toString(InputStream input) throws IOException {
return new String(input.readAllBytes(), StandardCharsets.UTF_8);
}
这使用添加到Java9中的readAllBytes方法。
其他回答
我已经写了一个这样的课,所以我想我会和大家分享。有时候,您不想仅仅为了一件事而添加Apache Commons,并且想要比Scanner更笨的东西,它不检查内容。
用法如下
// Read from InputStream
String data = new ReaderSink(inputStream, Charset.forName("UTF-8")).drain();
// Read from File
data = new ReaderSink(file, Charset.forName("UTF-8")).drain();
// Drain input stream to console
new ReaderSink(inputStream, Charset.forName("UTF-8")).drainTo(System.out);
以下是ReaderSink的代码:
import java.io.*;
import java.nio.charset.Charset;
/**
* A simple sink class that drains a {@link Reader} to a {@link String} or
* to a {@link Writer}.
*
* @author Ben Barkay
* @version 2/20/2014
*/
public class ReaderSink {
/**
* The default buffer size to use if no buffer size was specified.
*/
public static final int DEFAULT_BUFFER_SIZE = 1024;
/**
* The {@link Reader} that will be drained.
*/
private final Reader in;
/**
* Constructs a new {@code ReaderSink} for the specified file and charset.
* @param file The file to read from.
* @param charset The charset to use.
* @throws FileNotFoundException If the file was not found on the filesystem.
*/
public ReaderSink(File file, Charset charset) throws FileNotFoundException {
this(new FileInputStream(file), charset);
}
/**
* Constructs a new {@code ReaderSink} for the specified {@link InputStream}.
* @param in The {@link InputStream} to drain.
* @param charset The charset to use.
*/
public ReaderSink(InputStream in, Charset charset) {
this(new InputStreamReader(in, charset));
}
/**
* Constructs a new {@code ReaderSink} for the specified {@link Reader}.
* @param in The reader to drain.
*/
public ReaderSink(Reader in) {
this.in = in;
}
/**
* Drains the data from the underlying {@link Reader}, returning a {@link String} containing
* all of the read information. This method will use {@link #DEFAULT_BUFFER_SIZE} for
* its buffer size.
* @return A {@link String} containing all of the information that was read.
*/
public String drain() throws IOException {
return drain(DEFAULT_BUFFER_SIZE);
}
/**
* Drains the data from the underlying {@link Reader}, returning a {@link String} containing
* all of the read information.
* @param bufferSize The size of the buffer to use when reading.
* @return A {@link String} containing all of the information that was read.
*/
public String drain(int bufferSize) throws IOException {
StringWriter stringWriter = new StringWriter();
drainTo(stringWriter, bufferSize);
return stringWriter.toString();
}
/**
* Drains the data from the underlying {@link Reader}, writing it to the
* specified {@link Writer}. This method will use {@link #DEFAULT_BUFFER_SIZE} for
* its buffer size.
* @param out The {@link Writer} to write to.
*/
public void drainTo(Writer out) throws IOException {
drainTo(out, DEFAULT_BUFFER_SIZE);
}
/**
* Drains the data from the underlying {@link Reader}, writing it to the
* specified {@link Writer}.
* @param out The {@link Writer} to write to.
* @param bufferSize The size of the buffer to use when reader.
*/
public void drainTo(Writer out, int bufferSize) throws IOException {
char[] buffer = new char[bufferSize];
int read;
while ((read = in.read(buffer)) > -1) {
out.write(buffer, 0, read);
}
}
}
异-8859-1
如果您知道输入流的编码是ISO-8859-1或ASCII,这里有一种非常高效的方法来实现这一点。它(1)避免了StringWriter的内部StringBuffer中存在的不必要的同步,(2)避免了InputStreamReader的开销,(3)最小化了必须复制StringBuilder的内部字符数组的次数。
public static String iso_8859_1(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
chars.append((char)(buffer[i] & 0xFF));
}
}
return chars.toString();
}
UTF-8型
对于使用UTF-8编码的流,可以使用相同的通用策略:
public static String utf8(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
int state = 0;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
if ((state = nextStateUtf8(state, buffer[i])) >= 0) {
chars.appendCodePoint(state);
} else if (state == -1) { //error
state = 0;
chars.append('\uFFFD'); //replacement char
}
}
}
return chars.toString();
}
其中nextStateUtf8()函数定义如下:
/**
* Returns the next UTF-8 state given the next byte of input and the current state.
* If the input byte is the last byte in a valid UTF-8 byte sequence,
* the returned state will be the corresponding unicode character (in the range of 0 through 0x10FFFF).
* Otherwise, a negative integer is returned. A state of -1 is returned whenever an
* invalid UTF-8 byte sequence is detected.
*/
static int nextStateUtf8(int currentState, byte nextByte) {
switch (currentState & 0xF0000000) {
case 0:
if ((nextByte & 0x80) == 0) { //0 trailing bytes (ASCII)
return nextByte;
} else if ((nextByte & 0xE0) == 0xC0) { //1 trailing byte
if (nextByte == (byte) 0xC0 || nextByte == (byte) 0xC1) { //0xCO & 0xC1 are overlong
return -1;
} else {
return nextByte & 0xC000001F;
}
} else if ((nextByte & 0xF0) == 0xE0) { //2 trailing bytes
if (nextByte == (byte) 0xE0) { //possibly overlong
return nextByte & 0xA000000F;
} else if (nextByte == (byte) 0xED) { //possibly surrogate
return nextByte & 0xB000000F;
} else {
return nextByte & 0x9000000F;
}
} else if ((nextByte & 0xFC) == 0xF0) { //3 trailing bytes
if (nextByte == (byte) 0xF0) { //possibly overlong
return nextByte & 0x80000007;
} else {
return nextByte & 0xE0000007;
}
} else if (nextByte == (byte) 0xF4) { //3 trailing bytes, possibly undefined
return nextByte & 0xD0000007;
} else {
return -1;
}
case 0xE0000000: //3rd-to-last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0x80000000: //3rd-to-last continuation byte, check overlong
return (nextByte & 0xE0) == 0xA0 || (nextByte & 0xF0) == 0x90 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0xD0000000: //3rd-to-last continuation byte, check undefined
return (nextByte & 0xF0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0x90000000: //2nd-to-last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xA0000000: //2nd-to-last continuation byte, check overlong
return (nextByte & 0xE0) == 0xA0 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xB0000000: //2nd-to-last continuation byte, check surrogate
return (nextByte & 0xE0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xC0000000: //last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x3F : -1;
default:
return -1;
}
}
自动检测编码
如果您的输入流是使用ASCII、ISO-8859-1或UTF-8编码的,但您不确定是哪一种,我们可以使用与上一种方法类似的方法,但使用额外的编码检测组件在返回字符串之前自动检测编码。
public static String autoDetect(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
int state = 0;
boolean ascii = true;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
if ((state = nextStateUtf8(state, buffer[i])) > 0x7F)
ascii = false;
chars.append((char)(buffer[i] & 0xFF));
}
}
if (ascii || state < 0) { //probably not UTF-8
return chars.toString();
}
//probably UTF-8
int pos = 0;
char[] charBuf = new char[2];
for (int i = 0, len = chars.length(); i < len; i++) {
if ((state = nextStateUtf8(state, (byte)chars.charAt(i))) >= 0) {
boolean hi = Character.toChars(state, charBuf, 0) == 2;
chars.setCharAt(pos++, charBuf[0]);
if (hi) {
chars.setCharAt(pos++, charBuf[1]);
}
}
}
return chars.substring(0, pos);
}
如果您的输入流的编码既不是ISO-8859-1,也不是ASCII,也不是UTF-8,那么我就遵从已经存在的其他答案。
我已经创建了这段代码,并且它有效。没有所需的外部插件。
有一个字符串到流和流到字符串的转换器:
import java.io.ByteArrayInputStream;
import java.io.InputStream;
public class STRINGTOSTREAM {
public static void main(String[] args)
{
String text = "Hello Bhola..!\nMy Name Is Kishan ";
InputStream strm = new ByteArrayInputStream(text.getBytes()); // Convert String to Stream
String data = streamTostring(strm);
System.out.println(data);
}
static String streamTostring(InputStream stream)
{
String data = "";
try
{
StringBuilder stringbuld = new StringBuilder();
int i;
while ((i=stream.read())!=-1)
{
stringbuld.append((char)i);
}
data = stringbuld.toString();
}
catch(Exception e)
{
data = "No data Streamed.";
}
return data;
}
如果不能使用Commons IO(FileUtils/IOUtils/CopyUtils),下面是一个使用BufferedReader逐行读取文件的示例:
public class StringFromFile {
public static void main(String[] args) /*throws UnsupportedEncodingException*/ {
InputStream is = StringFromFile.class.getResourceAsStream("file.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(is/*, "UTF-8"*/));
final int CHARS_PER_PAGE = 5000; //counting spaces
StringBuilder builder = new StringBuilder(CHARS_PER_PAGE);
try {
for(String line=br.readLine(); line!=null; line=br.readLine()) {
builder.append(line);
builder.append('\n');
}
}
catch (IOException ignore) { }
String text = builder.toString();
System.out.println(text);
}
}
或者,如果你想要原始速度,我会根据Paul de Vrieze的建议(避免使用StringWriter(内部使用StringBuffer))提出一个变体:
public class StringFromFileFast {
public static void main(String[] args) /*throws UnsupportedEncodingException*/ {
InputStream is = StringFromFileFast.class.getResourceAsStream("file.txt");
InputStreamReader input = new InputStreamReader(is/*, "UTF-8"*/);
final int CHARS_PER_PAGE = 5000; //counting spaces
final char[] buffer = new char[CHARS_PER_PAGE];
StringBuilder output = new StringBuilder(CHARS_PER_PAGE);
try {
for(int read = input.read(buffer, 0, buffer.length);
read != -1;
read = input.read(buffer, 0, buffer.length)) {
output.append(buffer, 0, read);
}
} catch (IOException ignore) { }
String text = output.toString();
System.out.println(text);
}
}
这里有一种仅使用标准Java库的方法(请注意,流没有关闭,您的里程可能会有所不同)。
static String convertStreamToString(java.io.InputStream is) {
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
我从“愚蠢的扫描仪技巧”一文中学会了这个技巧。它工作的原因是因为Scanner迭代流中的令牌,在这种情况下,我们使用“输入边界的开始”(\A)来分离令牌,从而为流的整个内容只提供一个令牌。
注意,如果您需要明确输入流的编码,可以向Scanner构造函数提供第二个参数,指示要使用的字符集(例如“UTF-8”)。
雅各布也收到了帽子提示,他曾向我指出了上述文章。