我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

我把它们混合在一起,写了一个小程序。它从xml文件中读取并打印出来。而不是xzy给出你的文件路径。

    public static void main(String[] args) throws Exception {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(false);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(new FileInputStream(new File("C:/Users/xyz.xml")));
    prettyPrint(doc);

}

private static String prettyPrint(Document document)
        throws TransformerException {
    TransformerFactory transformerFactory = TransformerFactory
            .newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
    DOMSource source = new DOMSource(document);
    StringWriter strWriter = new StringWriter();
    StreamResult result = new StreamResult(strWriter);transformer.transform(source, result);
    System.out.println(strWriter.getBuffer().toString());

    return strWriter.getBuffer().toString();

}

其他回答

如果你不需要缩进那么多,但一些换行,这可能是足够的简单regex…

String leastPrettifiedXml = uglyXml.replaceAll("><", ">\n<");

代码很好,而不是因为缺少缩进而导致的结果。


(对于有缩进的解,请参见其他答案。)

I have found that in Java 1.6.0_32 the normal method to pretty print an XML string (using a Transformer with a null or identity xslt) does not behave as I would like if tags are merely separated by whitespace, as opposed to having no separating text. I tried using <xsl:strip-space elements="*"/> in my template to no avail. The simplest solution I found was to strip the space the way I wanted using a SAXSource and XML filter. Since my solution was for logging I also extended this to work with incomplete XML fragments. Note the normal method seems to work fine if you use a DOMSource but I did not want to use this because of the incompleteness and memory overhead.

public static class WhitespaceIgnoreFilter extends XMLFilterImpl
{

    @Override
    public void ignorableWhitespace(char[] arg0,
                                    int arg1,
                                    int arg2) throws SAXException
    {
        //Ignore it then...
    }

    @Override
    public void characters( char[] ch,
                            int start,
                            int length) throws SAXException
    {
        if (!new String(ch, start, length).trim().equals("")) 
               super.characters(ch, start, length); 
    }
}

public static String prettyXML(String logMsg, boolean allowBadlyFormedFragments) throws SAXException, IOException, TransformerException
    {
        TransformerFactory transFactory = TransformerFactory.newInstance();
        transFactory.setAttribute("indent-number", new Integer(2));
        Transformer transformer = transFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        StringWriter out = new StringWriter();
        XMLReader masterParser = SAXHelper.getSAXParser(true);
        XMLFilter parser = new WhitespaceIgnoreFilter();
        parser.setParent(masterParser);

        if(allowBadlyFormedFragments)
        {
            transformer.setErrorListener(new ErrorListener()
            {
                @Override
                public void warning(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void fatalError(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void error(TransformerException exception) throws TransformerException
                {
                }
            });
        }

        try
        {
            transformer.transform(new SAXSource(parser, new InputSource(new StringReader(logMsg))), new StreamResult(out));
        }
        catch (TransformerException e)
        {
            if(e.getCause() != null && e.getCause() instanceof SAXParseException)
            {
                if(!allowBadlyFormedFragments || !"XML document structures must start and end within the same entity.".equals(e.getCause().getMessage()))
                {
                    throw e;
                }
            }
            else
            {
                throw e;
            }
        }
        out.flush();
        return out.toString();
    }

如果您确信您有一个有效的XML,那么这个很简单,并且避免了XML DOM树。可能有一些错误,如果你看到任何错误,请评论

public String prettyPrint(String xml) {
            if (xml == null || xml.trim().length() == 0) return "";

            int stack = 0;
            StringBuilder pretty = new StringBuilder();
            String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");

            for (int i = 0; i < rows.length; i++) {
                    if (rows[i] == null || rows[i].trim().length() == 0) continue;

                    String row = rows[i].trim();
                    if (row.startsWith("<?")) {
                            // xml version tag
                            pretty.append(row + "\n");
                    } else if (row.startsWith("</")) {
                            // closing tag
                            String indent = repeatString("    ", --stack);
                            pretty.append(indent + row + "\n");
                    } else if (row.startsWith("<")) {
                            // starting tag
                            String indent = repeatString("    ", stack++);
                            pretty.append(indent + row + "\n");
                    } else {
                            // tag data
                            String indent = repeatString("    ", stack);
                            pretty.append(indent + row + "\n");
                    }
            }

            return pretty.toString().trim();
    }

Since you are starting with a String, you can convert to a DOM object (e.g. Node) before you use the Transformer. However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every </...> characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every </...> you see.

免责声明-我对下面的函数做了剪切/粘贴/文本编辑,所以它们可能不能按原样编译。

public static final Element createDOM(String strXML) 
    throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    InputSource sourceXML = new InputSource(new StringReader(strXML));
    Document xmlDoc = db.parse(sourceXML);
    Element e = xmlDoc.getDocumentElement();
    e.normalize();
    return e;
}

public static final void prettyPrint(Node xml, OutputStream out)
    throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
    Transformer tf = TransformerFactory.newInstance().newTransformer();
    tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(xml), new StreamResult(out));
}

除了max、codeskrap、David Easley和milosmns给出的答案外,还可以看看我的轻量级、高性能漂亮打印机库:xml-formatter

// construct lightweight, threadsafe, instance
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().build();

StringBuilder buffer = new StringBuilder();
String xml = ..; // also works with char[] or Reader

if(prettyPrinter.process(xml, buffer)) {
     // valid XML, print buffer
} else {
     // invalid XML, print xml
}

有时,就像直接从文件运行模拟SOAP服务时,有一个漂亮的打印机也能处理已经打印好的XML是很好的:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

正如一些人评论的那样,漂亮打印只是一种以更适合人类阅读的形式表示XML的方法——严格来说,XML数据中不应该有空格。

该库用于日志记录的漂亮打印,还包括用于过滤(子树移除/匿名化)和漂亮打印CDATA和Text节点中的XML的函数。