我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

对于那些寻找快速和肮脏的解决方案的人——它不需要XML是100%有效的。例如,在REST / SOAP日志的情况下(你永远不知道其他人发送了什么;-))

我发现并改进了一个我在网上找到的代码剪辑,我认为这仍然是一个有效的可能的方法:

public static String prettyPrintXMLAsString(String xmlString) {
    /* Remove new lines */
    final String LINE_BREAK = "\n";
    xmlString = xmlString.replaceAll(LINE_BREAK, "");
    StringBuffer prettyPrintXml = new StringBuffer();
    /* Group the xml tags */
    Pattern pattern = Pattern.compile("(<[^/][^>]+>)?([^<]*)(</[^>]+>)?(<[^/][^>]+/>)?");
    Matcher matcher = pattern.matcher(xmlString);
    int tabCount = 0;
    while (matcher.find()) {
        String str1 = (null == matcher.group(1) || "null".equals(matcher.group())) ? "" : matcher.group(1);
        String str2 = (null == matcher.group(2) || "null".equals(matcher.group())) ? "" : matcher.group(2);
        String str3 = (null == matcher.group(3) || "null".equals(matcher.group())) ? "" : matcher.group(3);
        String str4 = (null == matcher.group(4) || "null".equals(matcher.group())) ? "" : matcher.group(4);

        if (matcher.group() != null && !matcher.group().trim().equals("")) {
            printTabs(tabCount, prettyPrintXml);
            if (!str1.equals("") && str3.equals("")) {
                ++tabCount;
            }
            if (str1.equals("") && !str3.equals("")) {
                --tabCount;
                prettyPrintXml.deleteCharAt(prettyPrintXml.length() - 1);
            }

            prettyPrintXml.append(str1);
            prettyPrintXml.append(str2);
            prettyPrintXml.append(str3);
            if (!str4.equals("")) {
                prettyPrintXml.append(LINE_BREAK);
                printTabs(tabCount, prettyPrintXml);
                prettyPrintXml.append(str4);
            }
            prettyPrintXml.append(LINE_BREAK);
        }
    }
    return prettyPrintXml.toString();
}

private static void printTabs(int count, StringBuffer stringBuffer) {
    for (int i = 0; i < count; i++) {
        stringBuffer.append("\t");
    }
}

public static void main(String[] args) {
    String x = new String(
            "<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Body><soap:Fault><faultcode>soap:Client</faultcode><faultstring>INVALID_MESSAGE</faultstring><detail><ns3:XcbSoapFault xmlns=\"\" xmlns:ns3=\"http://www.someapp.eu/xcb/types/xcb/v1\"><CauseCode>20007</CauseCode><CauseText>INVALID_MESSAGE</CauseText><DebugInfo>Problems creating SAAJ object model</DebugInfo></ns3:XcbSoapFault></detail></soap:Fault></soap:Body></soap:Envelope>");
    System.out.println(prettyPrintXMLAsString(x));
}

输出如下:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <soap:Fault>
        <faultcode>soap:Client</faultcode>
        <faultstring>INVALID_MESSAGE</faultstring>
        <detail>
            <ns3:XcbSoapFault xmlns="" xmlns:ns3="http://www.someapp.eu/xcb/types/xcb/v1">
                <CauseCode>20007</CauseCode>
                <CauseText>INVALID_MESSAGE</CauseText>
                <DebugInfo>Problems creating SAAJ object model</DebugInfo>
            </ns3:XcbSoapFault>
        </detail>
    </soap:Fault>
  </soap:Body>
</soap:Envelope>

其他回答

我把它们混合在一起,写了一个小程序。它从xml文件中读取并打印出来。而不是xzy给出你的文件路径。

    public static void main(String[] args) throws Exception {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(false);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(new FileInputStream(new File("C:/Users/xyz.xml")));
    prettyPrint(doc);

}

private static String prettyPrint(Document document)
        throws TransformerException {
    TransformerFactory transformerFactory = TransformerFactory
            .newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
    DOMSource source = new DOMSource(document);
    StringWriter strWriter = new StringWriter();
    StreamResult result = new StreamResult(strWriter);transformer.transform(source, result);
    System.out.println(strWriter.getBuffer().toString());

    return strWriter.getBuffer().toString();

}

有一个非常好的命令行XML实用程序叫做xmlstarlet(http://xmlstar.sourceforge.net/),它可以做很多事情,很多人都在使用它。

您可以使用Runtime以编程方式执行此程序。然后读入格式化的输出文件。它具有比几行Java代码所能提供的更多选项和更好的错误报告。

下载xmlstarlet: http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589

以上所有的解决方案都不适合我,然后我找到了这个http://myshittycode.com/2014/02/10/java-properly-indenting-xml-string/

线索就是用XPath删除空格

    String xml = "<root>" +
             "\n   " +
             "\n<name>Coco Puff</name>" +
             "\n        <total>10</total>    </root>";

try {
    Document document = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder()
            .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

    XPath xPath = XPathFactory.newInstance().newXPath();
    NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                  document,
                                                  XPathConstants.NODESET);

    for (int i = 0; i < nodeList.getLength(); ++i) {
        Node node = nodeList.item(i);
        node.getParentNode().removeChild(node);
    }

    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

    StringWriter stringWriter = new StringWriter();
    StreamResult streamResult = new StreamResult(stringWriter);

    transformer.transform(new DOMSource(document), streamResult);

    System.out.println(stringWriter.toString());
}
catch (Exception e) {
    e.printStackTrace();
}

如果你不需要缩进那么多,但一些换行,这可能是足够的简单regex…

String leastPrettifiedXml = uglyXml.replaceAll("><", ">\n<");

代码很好,而不是因为缺少缩进而导致的结果。


(对于有缩进的解,请参见其他答案。)

Since you are starting with a String, you can convert to a DOM object (e.g. Node) before you use the Transformer. However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every </...> characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every </...> you see.

免责声明-我对下面的函数做了剪切/粘贴/文本编辑,所以它们可能不能按原样编译。

public static final Element createDOM(String strXML) 
    throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    InputSource sourceXML = new InputSource(new StringReader(strXML));
    Document xmlDoc = db.parse(sourceXML);
    Element e = xmlDoc.getDocumentElement();
    e.normalize();
    return e;
}

public static final void prettyPrint(Node xml, OutputStream out)
    throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
    Transformer tf = TransformerFactory.newInstance().newTransformer();
    tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(xml), new StreamResult(out));
}