我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

如果您确信您有一个有效的XML,那么这个很简单,并且避免了XML DOM树。可能有一些错误,如果你看到任何错误,请评论

public String prettyPrint(String xml) {
            if (xml == null || xml.trim().length() == 0) return "";

            int stack = 0;
            StringBuilder pretty = new StringBuilder();
            String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");

            for (int i = 0; i < rows.length; i++) {
                    if (rows[i] == null || rows[i].trim().length() == 0) continue;

                    String row = rows[i].trim();
                    if (row.startsWith("<?")) {
                            // xml version tag
                            pretty.append(row + "\n");
                    } else if (row.startsWith("</")) {
                            // closing tag
                            String indent = repeatString("    ", --stack);
                            pretty.append(indent + row + "\n");
                    } else if (row.startsWith("<")) {
                            // starting tag
                            String indent = repeatString("    ", stack++);
                            pretty.append(indent + row + "\n");
                    } else {
                            // tag data
                            String indent = repeatString("    ", stack);
                            pretty.append(indent + row + "\n");
                    }
            }

            return pretty.toString().trim();
    }

其他回答

请注意,排名靠前的答案需要使用xerces。

如果您不想添加这个外部依赖,那么您可以简单地使用标准jdk库(实际上是在内部使用xerces构建的)。

注意:jdk 1.5版本有一个bug,请参阅http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6296446,但现在已经解决了。

(注意,如果发生错误,将返回原始文本)

package com.test;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.InputSource;

public class XmlTest {
    public static void main(String[] args) {
        XmlTest t = new XmlTest();
        System.out.println(t.formatXml("<a><b><c/><d>text D</d><e value='0'/></b></a>"));
    }

    public String formatXml(String xml){
        try{
            Transformer serializer= SAXTransformerFactory.newInstance().newTransformer();
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            //serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            //serializer.setOutputProperty("{http://xml.customer.org/xslt}indent-amount", "2");
            Source xmlSource=new SAXSource(new InputSource(new ByteArrayInputStream(xml.getBytes())));
            StreamResult res =  new StreamResult(new ByteArrayOutputStream());            
            serializer.transform(xmlSource, res);
            return new String(((ByteArrayOutputStream)res.getOutputStream()).toByteArray());
        }catch(Exception e){
            //TODO log error
            return xml;
        }
    }

}

使用scala:

import xml._
val xml = XML.loadString("<tag><nested>hello</nested></tag>")
val formatted = new PrettyPrinter(150, 2).format(xml)
println(formatted)

如果你依赖scala-library.jar,你也可以在Java中这样做。它是这样的:

import scala.xml.*;

public class FormatXML {
    public static void main(String[] args) {
        String unformattedXml = "<tag><nested>hello</nested></tag>";
        PrettyPrinter pp = new PrettyPrinter(150, 3);
        String formatted = pp.format(XML.loadString(unformattedXml), TopScope$.MODULE$);
        System.out.println(formatted);
    }
}

PrettyPrinter对象是用两个整数构造的,第一个是最大行长,第二个是缩进步骤。

这是我自己问题的答案。我将各种结果的答案结合起来,编写了一个输出XML的类。

不保证它如何响应无效的XML或大型文档。

package ecb.sdw.pretty;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public XmlFormatter() {
    }

    public String format(String unformattedXml) {
        try {
            final Document document = parseXmlFile(unformattedXml);

            OutputFormat format = new OutputFormat(document);
            format.setLineWidth(65);
            format.setIndenting(true);
            format.setIndent(2);
            Writer out = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(out, format);
            serializer.serialize(document);

            return out.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private Document parseXmlFile(String in) {
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(in));
            return db.parse(is);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException(e);
        } catch (SAXException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }

}

凯文·哈肯森说: 但是,如果您知道您的XML字符串是有效的,并且您不想引起将字符串解析为DOM的内存开销,然后在DOM上运行转换以获得字符串—您可以通过字符解析进行一些老式的字符。在每个字符后插入换行符和空格,保持和缩进计数器(以确定空格的数量),为每个<…>和递减你看到的每一个。”

同意了。这种方法要快得多,依赖关系也少得多。

示例解决方案:

/**
 * XML utils, including formatting.
 */
public class XmlUtils
{
  private static XmlFormatter formatter = new XmlFormatter(2, 80);

  public static String formatXml(String s)
  {
    return formatter.format(s, 0);
  }

  public static String formatXml(String s, int initialIndent)
  {
    return formatter.format(s, initialIndent);
  }

  private static class XmlFormatter
  {
    private int indentNumChars;
    private int lineLength;
    private boolean singleLine;

    public XmlFormatter(int indentNumChars, int lineLength)
    {
      this.indentNumChars = indentNumChars;
      this.lineLength = lineLength;
    }

    public synchronized String format(String s, int initialIndent)
    {
      int indent = initialIndent;
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < s.length(); i++)
      {
        char currentChar = s.charAt(i);
        if (currentChar == '<')
        {
          char nextChar = s.charAt(i + 1);
          if (nextChar == '/')
            indent -= indentNumChars;
          if (!singleLine)   // Don't indent before closing element if we're creating opening and closing elements on a single line.
            sb.append(buildWhitespace(indent));
          if (nextChar != '?' && nextChar != '!' && nextChar != '/')
            indent += indentNumChars;
          singleLine = false;  // Reset flag.
        }
        sb.append(currentChar);
        if (currentChar == '>')
        {
          if (s.charAt(i - 1) == '/')
          {
            indent -= indentNumChars;
            sb.append("\n");
          }
          else
          {
            int nextStartElementPos = s.indexOf('<', i);
            if (nextStartElementPos > i + 1)
            {
              String textBetweenElements = s.substring(i + 1, nextStartElementPos);

              // If the space between elements is solely newlines, let them through to preserve additional newlines in source document.
              if (textBetweenElements.replaceAll("\n", "").length() == 0)
              {
                sb.append(textBetweenElements + "\n");
              }
              // Put tags and text on a single line if the text is short.
              else if (textBetweenElements.length() <= lineLength * 0.5)
              {
                sb.append(textBetweenElements);
                singleLine = true;
              }
              // For larger amounts of text, wrap lines to a maximum line length.
              else
              {
                sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n");
              }
              i = nextStartElementPos - 1;
            }
            else
            {
              sb.append("\n");
            }
          }
        }
      }
      return sb.toString();
    }
  }

  private static String buildWhitespace(int numChars)
  {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < numChars; i++)
      sb.append(" ");
    return sb.toString();
  }

  /**
   * Wraps the supplied text to the specified line length.
   * @lineLength the maximum length of each line in the returned string (not including indent if specified).
   * @indent optional number of whitespace characters to prepend to each line before the text.
   * @linePrefix optional string to append to the indent (before the text).
   * @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with
   * indent and prefix applied to each line.
   */
  private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix)
  {
    if (s == null)
      return null;

    StringBuilder sb = new StringBuilder();
    int lineStartPos = 0;
    int lineEndPos;
    boolean firstLine = true;
    while(lineStartPos < s.length())
    {
      if (!firstLine)
        sb.append("\n");
      else
        firstLine = false;

      if (lineStartPos + lineLength > s.length())
        lineEndPos = s.length() - 1;
      else
      {
        lineEndPos = lineStartPos + lineLength - 1;
        while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t'))
          lineEndPos--;
      }
      sb.append(buildWhitespace(indent));
      if (linePrefix != null)
        sb.append(linePrefix);

      sb.append(s.substring(lineStartPos, lineEndPos + 1));
      lineStartPos = lineEndPos + 1;
    }
    return sb.toString();
  }

  // other utils removed for brevity
}

稍微改进了milosmns的版本…

public static String getPrettyXml(String xml) {
    if (xml == null || xml.trim().length() == 0) return "";

    int stack = 0;
    StringBuilder pretty = new StringBuilder();
    String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");

    for (int i = 0; i < rows.length; i++) {
        if (rows[i] == null || rows[i].trim().length() == 0) continue;

        String row = rows[i].trim();
        if (row.startsWith("<?")) {
            pretty.append(row + "\n");
        } else if (row.startsWith("</")) {
            String indent = repeatString(--stack);
            pretty.append(indent + row + "\n");
        } else if (row.startsWith("<") && row.endsWith("/>") == false) {
            String indent = repeatString(stack++);
            pretty.append(indent + row + "\n");
            if (row.endsWith("]]>")) stack--;
        } else {
            String indent = repeatString(stack);
            pretty.append(indent + row + "\n");
        }
    }

    return pretty.toString().trim();
}

private static String repeatString(int stack) {
     StringBuilder indent = new StringBuilder();
     for (int i = 0; i < stack; i++) {
        indent.append(" ");
     }
     return indent.toString();
}