我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

关于“您必须首先构建DOM树”的评论:不,您不需要也不应该这样做。

相反,创建一个StreamSource(new StreamSource(new StringReader(str)),并将其提供给前面提到的标识转换器。这将使用SAX解析器,结果将快得多。 在这种情况下,构建中间树纯粹是开销。 否则,排名第一的答案是好的。

其他回答

请注意,排名靠前的答案需要使用xerces。

如果您不想添加这个外部依赖,那么您可以简单地使用标准jdk库(实际上是在内部使用xerces构建的)。

注意:jdk 1.5版本有一个bug,请参阅http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6296446,但现在已经解决了。

(注意,如果发生错误,将返回原始文本)

package com.test;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.InputSource;

public class XmlTest {
    public static void main(String[] args) {
        XmlTest t = new XmlTest();
        System.out.println(t.formatXml("<a><b><c/><d>text D</d><e value='0'/></b></a>"));
    }

    public String formatXml(String xml){
        try{
            Transformer serializer= SAXTransformerFactory.newInstance().newTransformer();
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            //serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            //serializer.setOutputProperty("{http://xml.customer.org/xslt}indent-amount", "2");
            Source xmlSource=new SAXSource(new InputSource(new ByteArrayInputStream(xml.getBytes())));
            StreamResult res =  new StreamResult(new ByteArrayOutputStream());            
            serializer.transform(xmlSource, res);
            return new String(((ByteArrayOutputStream)res.getOutputStream()).toByteArray());
        }catch(Exception e){
            //TODO log error
            return xml;
        }
    }

}

我过去使用org.dom4j.io.OutputFormat.createPrettyPrint()方法打印过

public String prettyPrint(final String xml){  

    if (StringUtils.isBlank(xml)) {
        throw new RuntimeException("xml was null or blank in prettyPrint()");
    }

    final StringWriter sw;

    try {
        final OutputFormat format = OutputFormat.createPrettyPrint();
        final org.dom4j.Document document = DocumentHelper.parseText(xml);
        sw = new StringWriter();
        final XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
    }
    catch (Exception e) {
        throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
    }
    return sw.toString();
}

我用Scala看到了一个答案,所以这里有另一个用Groovy的答案,以防有人觉得有趣。默认缩进为2步,XmlNodePrinter构造函数也可以传递另一个值。

def xml = "<tag><nested>hello</nested></tag>"
def stringWriter = new StringWriter()
def node = new XmlParser().parseText(xml);
new XmlNodePrinter(new PrintWriter(stringWriter)).print(node)
println stringWriter.toString()

如果groovy jar在类路径中,则使用Java

  String xml = "<tag><nested>hello</nested></tag>";
  StringWriter stringWriter = new StringWriter();
  Node node = new XmlParser().parseText(xml);
  new XmlNodePrinter(new PrintWriter(stringWriter)).print(node);
  System.out.println(stringWriter.toString());

除了max、codeskrap、David Easley和milosmns给出的答案外,还可以看看我的轻量级、高性能漂亮打印机库:xml-formatter

// construct lightweight, threadsafe, instance
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().build();

StringBuilder buffer = new StringBuilder();
String xml = ..; // also works with char[] or Reader

if(prettyPrinter.process(xml, buffer)) {
     // valid XML, print buffer
} else {
     // invalid XML, print xml
}

有时,就像直接从文件运行模拟SOAP服务时,有一个漂亮的打印机也能处理已经打印好的XML是很好的:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

正如一些人评论的那样,漂亮打印只是一种以更适合人类阅读的形式表示XML的方法——严格来说,XML数据中不应该有空格。

该库用于日志记录的漂亮打印,还包括用于过滤(子树移除/匿名化)和漂亮打印CDATA和Text节点中的XML的函数。

Since you are starting with a String, you can convert to a DOM object (e.g. Node) before you use the Transformer. However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every </...> characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every </...> you see.

免责声明-我对下面的函数做了剪切/粘贴/文本编辑,所以它们可能不能按原样编译。

public static final Element createDOM(String strXML) 
    throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    InputSource sourceXML = new InputSource(new StringReader(strXML));
    Document xmlDoc = db.parse(sourceXML);
    Element e = xmlDoc.getDocumentElement();
    e.normalize();
    return e;
}

public static final void prettyPrint(Node xml, OutputStream out)
    throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
    Transformer tf = TransformerFactory.newInstance().newTransformer();
    tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(xml), new StreamResult(out));
}