如果您有java.io.InputStream对象,应该如何处理该对象并生成String?
假设我有一个包含文本数据的InputStream,我想将其转换为String,例如,我可以将其写入日志文件。
获取InputStream并将其转换为字符串的最简单方法是什么?
public String convertStreamToString(InputStream is) {
// ???
}
如果您有java.io.InputStream对象,应该如何处理该对象并生成String?
假设我有一个包含文本数据的InputStream,我想将其转换为String,例如,我可以将其写入日志文件。
获取InputStream并将其转换为字符串的最简单方法是什么?
public String convertStreamToString(InputStream is) {
// ???
}
当前回答
异-8859-1
如果您知道输入流的编码是ISO-8859-1或ASCII,这里有一种非常高效的方法来实现这一点。它(1)避免了StringWriter的内部StringBuffer中存在的不必要的同步,(2)避免了InputStreamReader的开销,(3)最小化了必须复制StringBuilder的内部字符数组的次数。
public static String iso_8859_1(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
chars.append((char)(buffer[i] & 0xFF));
}
}
return chars.toString();
}
UTF-8型
对于使用UTF-8编码的流,可以使用相同的通用策略:
public static String utf8(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
int state = 0;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
if ((state = nextStateUtf8(state, buffer[i])) >= 0) {
chars.appendCodePoint(state);
} else if (state == -1) { //error
state = 0;
chars.append('\uFFFD'); //replacement char
}
}
}
return chars.toString();
}
其中nextStateUtf8()函数定义如下:
/**
* Returns the next UTF-8 state given the next byte of input and the current state.
* If the input byte is the last byte in a valid UTF-8 byte sequence,
* the returned state will be the corresponding unicode character (in the range of 0 through 0x10FFFF).
* Otherwise, a negative integer is returned. A state of -1 is returned whenever an
* invalid UTF-8 byte sequence is detected.
*/
static int nextStateUtf8(int currentState, byte nextByte) {
switch (currentState & 0xF0000000) {
case 0:
if ((nextByte & 0x80) == 0) { //0 trailing bytes (ASCII)
return nextByte;
} else if ((nextByte & 0xE0) == 0xC0) { //1 trailing byte
if (nextByte == (byte) 0xC0 || nextByte == (byte) 0xC1) { //0xCO & 0xC1 are overlong
return -1;
} else {
return nextByte & 0xC000001F;
}
} else if ((nextByte & 0xF0) == 0xE0) { //2 trailing bytes
if (nextByte == (byte) 0xE0) { //possibly overlong
return nextByte & 0xA000000F;
} else if (nextByte == (byte) 0xED) { //possibly surrogate
return nextByte & 0xB000000F;
} else {
return nextByte & 0x9000000F;
}
} else if ((nextByte & 0xFC) == 0xF0) { //3 trailing bytes
if (nextByte == (byte) 0xF0) { //possibly overlong
return nextByte & 0x80000007;
} else {
return nextByte & 0xE0000007;
}
} else if (nextByte == (byte) 0xF4) { //3 trailing bytes, possibly undefined
return nextByte & 0xD0000007;
} else {
return -1;
}
case 0xE0000000: //3rd-to-last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0x80000000: //3rd-to-last continuation byte, check overlong
return (nextByte & 0xE0) == 0xA0 || (nextByte & 0xF0) == 0x90 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0xD0000000: //3rd-to-last continuation byte, check undefined
return (nextByte & 0xF0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0x90000000: //2nd-to-last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xA0000000: //2nd-to-last continuation byte, check overlong
return (nextByte & 0xE0) == 0xA0 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xB0000000: //2nd-to-last continuation byte, check surrogate
return (nextByte & 0xE0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xC0000000: //last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x3F : -1;
default:
return -1;
}
}
自动检测编码
如果您的输入流是使用ASCII、ISO-8859-1或UTF-8编码的,但您不确定是哪一种,我们可以使用与上一种方法类似的方法,但使用额外的编码检测组件在返回字符串之前自动检测编码。
public static String autoDetect(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
int state = 0;
boolean ascii = true;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
if ((state = nextStateUtf8(state, buffer[i])) > 0x7F)
ascii = false;
chars.append((char)(buffer[i] & 0xFF));
}
}
if (ascii || state < 0) { //probably not UTF-8
return chars.toString();
}
//probably UTF-8
int pos = 0;
char[] charBuf = new char[2];
for (int i = 0, len = chars.length(); i < len; i++) {
if ((state = nextStateUtf8(state, (byte)chars.charAt(i))) >= 0) {
boolean hi = Character.toChars(state, charBuf, 0) == 2;
chars.setCharAt(pos++, charBuf[0]);
if (hi) {
chars.setCharAt(pos++, charBuf[1]);
}
}
}
return chars.substring(0, pos);
}
如果您的输入流的编码既不是ISO-8859-1,也不是ASCII,也不是UTF-8,那么我就遵从已经存在的其他答案。
其他回答
一个很好的方法是使用Apache Commons IOUItils将InputStream复制到StringWriter中。。。类似于
StringWriter writer = new StringWriter();
IOUtils.copy(inputStream, writer, encoding);
String theString = writer.toString();
甚至
// NB: does not close inputStream, you'll have to use try-with-resources for that
String theString = IOUtils.toString(inputStream, encoding);
或者,如果不想混合流和写入器,可以使用ByteArrayOutputStream。
注意:这可能不是个好主意。此方法使用递归,因此将非常快地命中StackOverflowError:
public String read (InputStream is) {
byte next = is.read();
return next == -1 ? "" : next + read(is); // Recursive part: reads next byte recursively
}
Apache Commons允许:
String myString = IOUtils.toString(myInputStream, "UTF-8");
当然,您可以选择UTF-8以外的其他字符编码。
另请参阅:(文档)
我做了一些计时测试,因为时间总是很重要的。
我试图以3种不同的方式将响应转换为字符串。(如下所示)为了可读性,我省略了try/catch块。
为了给出上下文,这是所有3种方法的前面代码:
String response;
String url = "www.blah.com/path?key=value";
GetMethod method = new GetMethod(url);
int status = client.executeMethod(method);
1)
response = method.getResponseBodyAsString();
2)
InputStream resp = method.getResponseBodyAsStream();
InputStreamReader is=new InputStreamReader(resp);
BufferedReader br=new BufferedReader(is);
String read = null;
StringBuffer sb = new StringBuffer();
while((read = br.readLine()) != null) {
sb.append(read);
}
response = sb.toString();
3)
InputStream iStream = method.getResponseBodyAsStream();
StringWriter writer = new StringWriter();
IOUtils.copy(iStream, writer, "UTF-8");
response = writer.toString();
因此,在使用相同的请求/响应数据对每种方法运行了500次测试之后,以下是数字。再次,这些是我的发现,你的发现可能不完全相同,但我写这篇文章是为了向其他人说明这些方法的效率差异。
排名:方法#1进近#3-比#1慢2.6%2号进近——比1号进近慢4.3%
任何这些方法都是获取响应并从中创建字符串的适当解决方案。
InputStream IS=new URL("http://www.petrol.si/api/gas_prices.json").openStream();
ByteArrayOutputStream BAOS=new ByteArrayOutputStream();
IOUtils.copy(IS, BAOS);
String d= new String(BAOS.toByteArray(),"UTF-8");
System.out.println(d);