如果您有java.io.InputStream对象,应该如何处理该对象并生成String?


假设我有一个包含文本数据的InputStream,我想将其转换为String,例如,我可以将其写入日志文件。

获取InputStream并将其转换为字符串的最简单方法是什么?

public String convertStreamToString(InputStream is) {
// ???
}

当前回答

这是一个改编自org.apache.commons.io.IOUtils源代码的答案,适用于那些想要apache实现但不想要整个库的人。

private static final int BUFFER_SIZE = 4 * 1024;

public static String inputStreamToString(InputStream inputStream, String charsetName)
        throws IOException {
    StringBuilder builder = new StringBuilder();
    InputStreamReader reader = new InputStreamReader(inputStream, charsetName);
    char[] buffer = new char[BUFFER_SIZE];
    int length;
    while ((length = reader.read(buffer)) != -1) {
        builder.append(buffer, 0, length);
    }
    return builder.toString();
}

其他回答

与Okio一起:

String result = Okio.buffer(Okio.source(inputStream)).readUtf8();

异-8859-1

如果您知道输入流的编码是ISO-8859-1或ASCII,这里有一种非常高效的方法来实现这一点。它(1)避免了StringWriter的内部StringBuffer中存在的不必要的同步,(2)避免了InputStreamReader的开销,(3)最小化了必须复制StringBuilder的内部字符数组的次数。

public static String iso_8859_1(InputStream is) throws IOException {
    StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
    byte[] buffer = new byte[4096];
    int n;
    while ((n = is.read(buffer)) != -1) {
        for (int i = 0; i < n; i++) {
            chars.append((char)(buffer[i] & 0xFF));
        }
    }
    return chars.toString();
}

UTF-8型

对于使用UTF-8编码的流,可以使用相同的通用策略:

public static String utf8(InputStream is) throws IOException {
    StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
    byte[] buffer = new byte[4096];
    int n;
    int state = 0;
    while ((n = is.read(buffer)) != -1) {
        for (int i = 0; i < n; i++) {
            if ((state = nextStateUtf8(state, buffer[i])) >= 0) {
                chars.appendCodePoint(state);
            } else if (state == -1) { //error
                state = 0;
                chars.append('\uFFFD'); //replacement char
            }
        }
    }
    return chars.toString();
}

其中nextStateUtf8()函数定义如下:

/**
 * Returns the next UTF-8 state given the next byte of input and the current state.
 * If the input byte is the last byte in a valid UTF-8 byte sequence,
 * the returned state will be the corresponding unicode character (in the range of 0 through 0x10FFFF).
 * Otherwise, a negative integer is returned. A state of -1 is returned whenever an
 * invalid UTF-8 byte sequence is detected.
 */
static int nextStateUtf8(int currentState, byte nextByte) {
    switch (currentState & 0xF0000000) {
        case 0:
            if ((nextByte & 0x80) == 0) { //0 trailing bytes (ASCII)
                return nextByte;
            } else if ((nextByte & 0xE0) == 0xC0) { //1 trailing byte
                if (nextByte == (byte) 0xC0 || nextByte == (byte) 0xC1) { //0xCO & 0xC1 are overlong
                    return -1;
                } else {
                    return nextByte & 0xC000001F;
                }
            } else if ((nextByte & 0xF0) == 0xE0) { //2 trailing bytes
                if (nextByte == (byte) 0xE0) { //possibly overlong
                    return nextByte & 0xA000000F;
                } else if (nextByte == (byte) 0xED) { //possibly surrogate
                    return nextByte & 0xB000000F;
                } else {
                    return nextByte & 0x9000000F;
                }
            } else if ((nextByte & 0xFC) == 0xF0) { //3 trailing bytes
                if (nextByte == (byte) 0xF0) { //possibly overlong
                    return nextByte & 0x80000007;
                } else {
                    return nextByte & 0xE0000007;
                }
            } else if (nextByte == (byte) 0xF4) { //3 trailing bytes, possibly undefined
                return nextByte & 0xD0000007;
            } else {
                return -1;
            }
        case 0xE0000000: //3rd-to-last continuation byte
            return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
        case 0x80000000: //3rd-to-last continuation byte, check overlong
            return (nextByte & 0xE0) == 0xA0 || (nextByte & 0xF0) == 0x90 ? currentState << 6 | nextByte & 0x9000003F : -1;
        case 0xD0000000: //3rd-to-last continuation byte, check undefined
            return (nextByte & 0xF0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
        case 0x90000000: //2nd-to-last continuation byte
            return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
        case 0xA0000000: //2nd-to-last continuation byte, check overlong
            return (nextByte & 0xE0) == 0xA0 ? currentState << 6 | nextByte & 0xC000003F : -1;
        case 0xB0000000: //2nd-to-last continuation byte, check surrogate
            return (nextByte & 0xE0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
        case 0xC0000000: //last continuation byte
            return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x3F : -1;
        default:
            return -1;
    }
}

自动检测编码

如果您的输入流是使用ASCII、ISO-8859-1或UTF-8编码的,但您不确定是哪一种,我们可以使用与上一种方法类似的方法,但使用额外的编码检测组件在返回字符串之前自动检测编码。

public static String autoDetect(InputStream is) throws IOException {
    StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
    byte[] buffer = new byte[4096];
    int n;
    int state = 0;
    boolean ascii = true;
    while ((n = is.read(buffer)) != -1) {
        for (int i = 0; i < n; i++) {
            if ((state = nextStateUtf8(state, buffer[i])) > 0x7F)
                ascii = false;
            chars.append((char)(buffer[i] & 0xFF));
        }
    }

    if (ascii || state < 0) { //probably not UTF-8
        return chars.toString();
    }
    //probably UTF-8
    int pos = 0;
    char[] charBuf = new char[2];
    for (int i = 0, len = chars.length(); i < len; i++) {
        if ((state = nextStateUtf8(state, (byte)chars.charAt(i))) >= 0) {
            boolean hi = Character.toChars(state, charBuf, 0) == 2;
            chars.setCharAt(pos++, charBuf[0]);
            if (hi) {
                chars.setCharAt(pos++, charBuf[1]);
            }
        }
    }
    return chars.substring(0, pos);
}

如果您的输入流的编码既不是ISO-8859-1,也不是ASCII,也不是UTF-8,那么我就遵从已经存在的其他答案。

注意:这可能不是个好主意。此方法使用递归,因此将非常快地命中StackOverflowError:

public String read (InputStream is) {
    byte next = is.read();
    return next == -1 ? "" : next + read(is); // Recursive part: reads next byte recursively
}

如果需要将字符串转换为没有外部库的特定字符集,则:

public String convertStreamToString(InputStream is) throws IOException {
  try (ByteArrayOutputStream baos = new ByteArrayOutputStream();) {
    is.transferTo(baos);
    return baos.toString(StandardCharsets.UTF_8);
  }
}

我已经创建了这段代码,并且它有效。没有所需的外部插件。

有一个字符串到流和流到字符串的转换器:

import java.io.ByteArrayInputStream;
import java.io.InputStream;

public class STRINGTOSTREAM {

    public static void main(String[] args)
    {
        String text = "Hello Bhola..!\nMy Name Is Kishan ";

        InputStream strm = new ByteArrayInputStream(text.getBytes());    // Convert String to Stream

        String data = streamTostring(strm);

        System.out.println(data);
    }

    static String streamTostring(InputStream stream)
    {
        String data = "";

        try
        {
            StringBuilder stringbuld = new StringBuilder();
            int i;
            while ((i=stream.read())!=-1)
            {
                stringbuld.append((char)i);
            }
            data = stringbuld.toString();
        }
        catch(Exception e)
        {
            data = "No data Streamed.";
        }
        return data;
    }