我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

除了max、codeskrap、David Easley和milosmns给出的答案外,还可以看看我的轻量级、高性能漂亮打印机库:xml-formatter

// construct lightweight, threadsafe, instance
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().build();

StringBuilder buffer = new StringBuilder();
String xml = ..; // also works with char[] or Reader

if(prettyPrinter.process(xml, buffer)) {
     // valid XML, print buffer
} else {
     // invalid XML, print xml
}

有时,就像直接从文件运行模拟SOAP服务时,有一个漂亮的打印机也能处理已经打印好的XML是很好的:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

正如一些人评论的那样,漂亮打印只是一种以更适合人类阅读的形式表示XML的方法——严格来说,XML数据中不应该有空格。

该库用于日志记录的漂亮打印,还包括用于过滤(子树移除/匿名化)和漂亮打印CDATA和Text节点中的XML的函数。

其他回答

Since you are starting with a String, you can convert to a DOM object (e.g. Node) before you use the Transformer. However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every </...> characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every </...> you see.

免责声明-我对下面的函数做了剪切/粘贴/文本编辑,所以它们可能不能按原样编译。

public static final Element createDOM(String strXML) 
    throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    InputSource sourceXML = new InputSource(new StringReader(strXML));
    Document xmlDoc = db.parse(sourceXML);
    Element e = xmlDoc.getDocumentElement();
    e.normalize();
    return e;
}

public static final void prettyPrint(Node xml, OutputStream out)
    throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
    Transformer tf = TransformerFactory.newInstance().newTransformer();
    tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(xml), new StreamResult(out));
}

在提出我自己的解决方案之前,我应该先看看这一页!不管怎样,我使用Java递归来解析xml页面。此代码是完全自包含的,不依赖于第三方库。也. .它使用递归!

// you call this method passing in the xml text
public static void prettyPrint(String text){
    prettyPrint(text, 0);
}

// "index" corresponds to the number of levels of nesting and/or the number of tabs to print before printing the tag
public static void prettyPrint(String xmlText, int index){
    boolean foundTagStart = false;
    StringBuilder tagChars = new StringBuilder();
    String startTag = "";
    String endTag = "";
    String[] chars = xmlText.split("");
    // find the next start tag
    for(String ch : chars){
        if(ch.equalsIgnoreCase("<")){
            tagChars.append(ch);
            foundTagStart = true;
        } else if(ch.equalsIgnoreCase(">") && foundTagStart){
            startTag = tagChars.append(ch).toString();
            String tempTag = startTag;
            endTag = (tempTag.contains("\"") ? (tempTag.split(" ")[0] + ">") : tempTag).replace("<", "</"); // <startTag attr1=1 attr2=2> => </startTag>
            break;
        } else if(foundTagStart){
            tagChars.append(ch);
        }
    }
    // once start and end tag are calculated, print start tag, then content, then end tag
    if(foundTagStart){
        int startIndex = xmlText.indexOf(startTag);
        int endIndex = xmlText.indexOf(endTag);
        // handle if matching tags NOT found
        if((startIndex < 0) || (endIndex < 0)){
            if(startIndex < 0) {
                // no start tag found
                return;
            } else {
                // start tag found, no end tag found (handles single tags aka "<mytag/>" or "<?xml ...>")
                printTabs(index);
                System.out.println(startTag);
                // move on to the next tag
                // NOTE: "index" (not index+1) because next tag is on same level as this one
                prettyPrint(xmlText.substring(startIndex+startTag.length(), xmlText.length()), index);
                return;
            }
        // handle when matching tags found
        } else {
            String content = xmlText.substring(startIndex+startTag.length(), endIndex);
            boolean isTagContainsTags = content.contains("<"); // content contains tags
            printTabs(index);
            if(isTagContainsTags){ // ie: <tag1><tag2>stuff</tag2></tag1>
                System.out.println(startTag);
                prettyPrint(content, index+1); // "index+1" because "content" is nested
                printTabs(index);
            } else {
                System.out.print(startTag); // ie: <tag1>stuff</tag1> or <tag1></tag1>
                System.out.print(content);
            }
            System.out.println(endTag);
            int nextIndex = endIndex + endTag.length();
            if(xmlText.length() > nextIndex){ // if there are more tags on this level, continue
                prettyPrint(xmlText.substring(nextIndex, xmlText.length()), index);
            }
        }
    } else {
        System.out.print(xmlText);
    }
}

private static void printTabs(int counter){
    while(counter-- > 0){ 
        System.out.print("\t");
    }
}

我过去使用org.dom4j.io.OutputFormat.createPrettyPrint()方法打印过

public String prettyPrint(final String xml){  

    if (StringUtils.isBlank(xml)) {
        throw new RuntimeException("xml was null or blank in prettyPrint()");
    }

    final StringWriter sw;

    try {
        final OutputFormat format = OutputFormat.createPrettyPrint();
        final org.dom4j.Document document = DocumentHelper.parseText(xml);
        sw = new StringWriter();
        final XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
    }
    catch (Exception e) {
        throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
    }
    return sw.toString();
}
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
// initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);

注意:根据Java版本的不同,结果可能有所不同。搜索特定于您的平台的解决方案。

如果你不需要缩进那么多,但一些换行,这可能是足够的简单regex…

String leastPrettifiedXml = uglyXml.replaceAll("><", ">\n<");

代码很好,而不是因为缺少缩进而导致的结果。


(对于有缩进的解,请参见其他答案。)