我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

我用Scala看到了一个答案,所以这里有另一个用Groovy的答案,以防有人觉得有趣。默认缩进为2步,XmlNodePrinter构造函数也可以传递另一个值。

def xml = "<tag><nested>hello</nested></tag>"
def stringWriter = new StringWriter()
def node = new XmlParser().parseText(xml);
new XmlNodePrinter(new PrintWriter(stringWriter)).print(node)
println stringWriter.toString()

如果groovy jar在类路径中,则使用Java

  String xml = "<tag><nested>hello</nested></tag>";
  StringWriter stringWriter = new StringWriter();
  Node node = new XmlParser().parseText(xml);
  new XmlNodePrinter(new PrintWriter(stringWriter)).print(node);
  System.out.println(stringWriter.toString());

其他回答

关于“您必须首先构建DOM树”的评论:不,您不需要也不应该这样做。

相反,创建一个StreamSource(new StreamSource(new StringReader(str)),并将其提供给前面提到的标识转换器。这将使用SAX解析器,结果将快得多。 在这种情况下,构建中间树纯粹是开销。 否则,排名第一的答案是好的。

试试这个:

 try
                    {
                        TransformerFactory transFactory = TransformerFactory.newInstance();
                        Transformer transformer = null;
                        transformer = transFactory.newTransformer();
                        StringWriter buffer = new StringWriter();
                        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
                        transformer.transform(new DOMSource(element),
                                  new StreamResult(buffer)); 
                        String str = buffer.toString();
                        System.out.println("XML INSIDE IS #########################################"+str);
                        return element;
                    }
                    catch (TransformerConfigurationException e)
                    {
                        e.printStackTrace();
                    }
                    catch (TransformerException e)
                    {
                        e.printStackTrace();
                    }

下面的代码工作得很好

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

String formattedXml1 = prettyFormat("<root><child>aaa</child><child/></root>");

public static String prettyFormat(String input) {
    return prettyFormat(input, "2");
}

public static String prettyFormat(String input, String indent) {
    Source xmlInput = new StreamSource(new StringReader(input));
    StringWriter stringWriter = new StringWriter();
    try {
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", indent);
        transformer.transform(xmlInput, new StreamResult(stringWriter));

        String pretty = stringWriter.toString();
        pretty = pretty.replace("\r\n", "\n");
        return pretty;              
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

凯文·哈肯森说: 但是,如果您知道您的XML字符串是有效的,并且您不想引起将字符串解析为DOM的内存开销,然后在DOM上运行转换以获得字符串—您可以通过字符解析进行一些老式的字符。在每个字符后插入换行符和空格,保持和缩进计数器(以确定空格的数量),为每个<…>和递减你看到的每一个。”

同意了。这种方法要快得多,依赖关系也少得多。

示例解决方案:

/**
 * XML utils, including formatting.
 */
public class XmlUtils
{
  private static XmlFormatter formatter = new XmlFormatter(2, 80);

  public static String formatXml(String s)
  {
    return formatter.format(s, 0);
  }

  public static String formatXml(String s, int initialIndent)
  {
    return formatter.format(s, initialIndent);
  }

  private static class XmlFormatter
  {
    private int indentNumChars;
    private int lineLength;
    private boolean singleLine;

    public XmlFormatter(int indentNumChars, int lineLength)
    {
      this.indentNumChars = indentNumChars;
      this.lineLength = lineLength;
    }

    public synchronized String format(String s, int initialIndent)
    {
      int indent = initialIndent;
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < s.length(); i++)
      {
        char currentChar = s.charAt(i);
        if (currentChar == '<')
        {
          char nextChar = s.charAt(i + 1);
          if (nextChar == '/')
            indent -= indentNumChars;
          if (!singleLine)   // Don't indent before closing element if we're creating opening and closing elements on a single line.
            sb.append(buildWhitespace(indent));
          if (nextChar != '?' && nextChar != '!' && nextChar != '/')
            indent += indentNumChars;
          singleLine = false;  // Reset flag.
        }
        sb.append(currentChar);
        if (currentChar == '>')
        {
          if (s.charAt(i - 1) == '/')
          {
            indent -= indentNumChars;
            sb.append("\n");
          }
          else
          {
            int nextStartElementPos = s.indexOf('<', i);
            if (nextStartElementPos > i + 1)
            {
              String textBetweenElements = s.substring(i + 1, nextStartElementPos);

              // If the space between elements is solely newlines, let them through to preserve additional newlines in source document.
              if (textBetweenElements.replaceAll("\n", "").length() == 0)
              {
                sb.append(textBetweenElements + "\n");
              }
              // Put tags and text on a single line if the text is short.
              else if (textBetweenElements.length() <= lineLength * 0.5)
              {
                sb.append(textBetweenElements);
                singleLine = true;
              }
              // For larger amounts of text, wrap lines to a maximum line length.
              else
              {
                sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n");
              }
              i = nextStartElementPos - 1;
            }
            else
            {
              sb.append("\n");
            }
          }
        }
      }
      return sb.toString();
    }
  }

  private static String buildWhitespace(int numChars)
  {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < numChars; i++)
      sb.append(" ");
    return sb.toString();
  }

  /**
   * Wraps the supplied text to the specified line length.
   * @lineLength the maximum length of each line in the returned string (not including indent if specified).
   * @indent optional number of whitespace characters to prepend to each line before the text.
   * @linePrefix optional string to append to the indent (before the text).
   * @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with
   * indent and prefix applied to each line.
   */
  private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix)
  {
    if (s == null)
      return null;

    StringBuilder sb = new StringBuilder();
    int lineStartPos = 0;
    int lineEndPos;
    boolean firstLine = true;
    while(lineStartPos < s.length())
    {
      if (!firstLine)
        sb.append("\n");
      else
        firstLine = false;

      if (lineStartPos + lineLength > s.length())
        lineEndPos = s.length() - 1;
      else
      {
        lineEndPos = lineStartPos + lineLength - 1;
        while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t'))
          lineEndPos--;
      }
      sb.append(buildWhitespace(indent));
      if (linePrefix != null)
        sb.append(linePrefix);

      sb.append(s.substring(lineStartPos, lineEndPos + 1));
      lineStartPos = lineEndPos + 1;
    }
    return sb.toString();
  }

  // other utils removed for brevity
}

这是我自己问题的答案。我将各种结果的答案结合起来,编写了一个输出XML的类。

不保证它如何响应无效的XML或大型文档。

package ecb.sdw.pretty;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public XmlFormatter() {
    }

    public String format(String unformattedXml) {
        try {
            final Document document = parseXmlFile(unformattedXml);

            OutputFormat format = new OutputFormat(document);
            format.setLineWidth(65);
            format.setIndenting(true);
            format.setIndent(2);
            Writer out = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(out, format);
            serializer.serialize(document);

            return out.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private Document parseXmlFile(String in) {
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(in));
            return db.parse(is);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException(e);
        } catch (SAXException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }

}