我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

凯文·哈肯森说: 但是,如果您知道您的XML字符串是有效的,并且您不想引起将字符串解析为DOM的内存开销,然后在DOM上运行转换以获得字符串—您可以通过字符解析进行一些老式的字符。在每个字符后插入换行符和空格,保持和缩进计数器(以确定空格的数量),为每个<…>和递减你看到的每一个。”

同意了。这种方法要快得多,依赖关系也少得多。

示例解决方案:

/**
 * XML utils, including formatting.
 */
public class XmlUtils
{
  private static XmlFormatter formatter = new XmlFormatter(2, 80);

  public static String formatXml(String s)
  {
    return formatter.format(s, 0);
  }

  public static String formatXml(String s, int initialIndent)
  {
    return formatter.format(s, initialIndent);
  }

  private static class XmlFormatter
  {
    private int indentNumChars;
    private int lineLength;
    private boolean singleLine;

    public XmlFormatter(int indentNumChars, int lineLength)
    {
      this.indentNumChars = indentNumChars;
      this.lineLength = lineLength;
    }

    public synchronized String format(String s, int initialIndent)
    {
      int indent = initialIndent;
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < s.length(); i++)
      {
        char currentChar = s.charAt(i);
        if (currentChar == '<')
        {
          char nextChar = s.charAt(i + 1);
          if (nextChar == '/')
            indent -= indentNumChars;
          if (!singleLine)   // Don't indent before closing element if we're creating opening and closing elements on a single line.
            sb.append(buildWhitespace(indent));
          if (nextChar != '?' && nextChar != '!' && nextChar != '/')
            indent += indentNumChars;
          singleLine = false;  // Reset flag.
        }
        sb.append(currentChar);
        if (currentChar == '>')
        {
          if (s.charAt(i - 1) == '/')
          {
            indent -= indentNumChars;
            sb.append("\n");
          }
          else
          {
            int nextStartElementPos = s.indexOf('<', i);
            if (nextStartElementPos > i + 1)
            {
              String textBetweenElements = s.substring(i + 1, nextStartElementPos);

              // If the space between elements is solely newlines, let them through to preserve additional newlines in source document.
              if (textBetweenElements.replaceAll("\n", "").length() == 0)
              {
                sb.append(textBetweenElements + "\n");
              }
              // Put tags and text on a single line if the text is short.
              else if (textBetweenElements.length() <= lineLength * 0.5)
              {
                sb.append(textBetweenElements);
                singleLine = true;
              }
              // For larger amounts of text, wrap lines to a maximum line length.
              else
              {
                sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n");
              }
              i = nextStartElementPos - 1;
            }
            else
            {
              sb.append("\n");
            }
          }
        }
      }
      return sb.toString();
    }
  }

  private static String buildWhitespace(int numChars)
  {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < numChars; i++)
      sb.append(" ");
    return sb.toString();
  }

  /**
   * Wraps the supplied text to the specified line length.
   * @lineLength the maximum length of each line in the returned string (not including indent if specified).
   * @indent optional number of whitespace characters to prepend to each line before the text.
   * @linePrefix optional string to append to the indent (before the text).
   * @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with
   * indent and prefix applied to each line.
   */
  private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix)
  {
    if (s == null)
      return null;

    StringBuilder sb = new StringBuilder();
    int lineStartPos = 0;
    int lineEndPos;
    boolean firstLine = true;
    while(lineStartPos < s.length())
    {
      if (!firstLine)
        sb.append("\n");
      else
        firstLine = false;

      if (lineStartPos + lineLength > s.length())
        lineEndPos = s.length() - 1;
      else
      {
        lineEndPos = lineStartPos + lineLength - 1;
        while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t'))
          lineEndPos--;
      }
      sb.append(buildWhitespace(indent));
      if (linePrefix != null)
        sb.append(linePrefix);

      sb.append(s.substring(lineStartPos, lineEndPos + 1));
      lineStartPos = lineEndPos + 1;
    }
    return sb.toString();
  }

  // other utils removed for brevity
}

其他回答

这是我自己问题的答案。我将各种结果的答案结合起来,编写了一个输出XML的类。

不保证它如何响应无效的XML或大型文档。

package ecb.sdw.pretty;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public XmlFormatter() {
    }

    public String format(String unformattedXml) {
        try {
            final Document document = parseXmlFile(unformattedXml);

            OutputFormat format = new OutputFormat(document);
            format.setLineWidth(65);
            format.setIndenting(true);
            format.setIndent(2);
            Writer out = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(out, format);
            serializer.serialize(document);

            return out.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private Document parseXmlFile(String in) {
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(in));
            return db.parse(is);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException(e);
        } catch (SAXException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }

}

有一个非常好的命令行XML实用程序叫做xmlstarlet(http://xmlstar.sourceforge.net/),它可以做很多事情,很多人都在使用它。

您可以使用Runtime以编程方式执行此程序。然后读入格式化的输出文件。它具有比几行Java代码所能提供的更多选项和更好的错误报告。

下载xmlstarlet: http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589

对于那些寻找快速和肮脏的解决方案的人——它不需要XML是100%有效的。例如,在REST / SOAP日志的情况下(你永远不知道其他人发送了什么;-))

我发现并改进了一个我在网上找到的代码剪辑,我认为这仍然是一个有效的可能的方法:

public static String prettyPrintXMLAsString(String xmlString) {
    /* Remove new lines */
    final String LINE_BREAK = "\n";
    xmlString = xmlString.replaceAll(LINE_BREAK, "");
    StringBuffer prettyPrintXml = new StringBuffer();
    /* Group the xml tags */
    Pattern pattern = Pattern.compile("(<[^/][^>]+>)?([^<]*)(</[^>]+>)?(<[^/][^>]+/>)?");
    Matcher matcher = pattern.matcher(xmlString);
    int tabCount = 0;
    while (matcher.find()) {
        String str1 = (null == matcher.group(1) || "null".equals(matcher.group())) ? "" : matcher.group(1);
        String str2 = (null == matcher.group(2) || "null".equals(matcher.group())) ? "" : matcher.group(2);
        String str3 = (null == matcher.group(3) || "null".equals(matcher.group())) ? "" : matcher.group(3);
        String str4 = (null == matcher.group(4) || "null".equals(matcher.group())) ? "" : matcher.group(4);

        if (matcher.group() != null && !matcher.group().trim().equals("")) {
            printTabs(tabCount, prettyPrintXml);
            if (!str1.equals("") && str3.equals("")) {
                ++tabCount;
            }
            if (str1.equals("") && !str3.equals("")) {
                --tabCount;
                prettyPrintXml.deleteCharAt(prettyPrintXml.length() - 1);
            }

            prettyPrintXml.append(str1);
            prettyPrintXml.append(str2);
            prettyPrintXml.append(str3);
            if (!str4.equals("")) {
                prettyPrintXml.append(LINE_BREAK);
                printTabs(tabCount, prettyPrintXml);
                prettyPrintXml.append(str4);
            }
            prettyPrintXml.append(LINE_BREAK);
        }
    }
    return prettyPrintXml.toString();
}

private static void printTabs(int count, StringBuffer stringBuffer) {
    for (int i = 0; i < count; i++) {
        stringBuffer.append("\t");
    }
}

public static void main(String[] args) {
    String x = new String(
            "<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Body><soap:Fault><faultcode>soap:Client</faultcode><faultstring>INVALID_MESSAGE</faultstring><detail><ns3:XcbSoapFault xmlns=\"\" xmlns:ns3=\"http://www.someapp.eu/xcb/types/xcb/v1\"><CauseCode>20007</CauseCode><CauseText>INVALID_MESSAGE</CauseText><DebugInfo>Problems creating SAAJ object model</DebugInfo></ns3:XcbSoapFault></detail></soap:Fault></soap:Body></soap:Envelope>");
    System.out.println(prettyPrintXMLAsString(x));
}

输出如下:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <soap:Fault>
        <faultcode>soap:Client</faultcode>
        <faultstring>INVALID_MESSAGE</faultstring>
        <detail>
            <ns3:XcbSoapFault xmlns="" xmlns:ns3="http://www.someapp.eu/xcb/types/xcb/v1">
                <CauseCode>20007</CauseCode>
                <CauseText>INVALID_MESSAGE</CauseText>
                <DebugInfo>Problems creating SAAJ object model</DebugInfo>
            </ns3:XcbSoapFault>
        </detail>
    </soap:Fault>
  </soap:Body>
</soap:Envelope>

如果您确信您有一个有效的XML,那么这个很简单,并且避免了XML DOM树。可能有一些错误,如果你看到任何错误,请评论

public String prettyPrint(String xml) {
            if (xml == null || xml.trim().length() == 0) return "";

            int stack = 0;
            StringBuilder pretty = new StringBuilder();
            String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");

            for (int i = 0; i < rows.length; i++) {
                    if (rows[i] == null || rows[i].trim().length() == 0) continue;

                    String row = rows[i].trim();
                    if (row.startsWith("<?")) {
                            // xml version tag
                            pretty.append(row + "\n");
                    } else if (row.startsWith("</")) {
                            // closing tag
                            String indent = repeatString("    ", --stack);
                            pretty.append(indent + row + "\n");
                    } else if (row.startsWith("<")) {
                            // starting tag
                            String indent = repeatString("    ", stack++);
                            pretty.append(indent + row + "\n");
                    } else {
                            // tag data
                            String indent = repeatString("    ", stack);
                            pretty.append(indent + row + "\n");
                    }
            }

            return pretty.toString().trim();
    }

下面是一种使用dom4j的方法:

进口:

import org.dom4j.Document;  
import org.dom4j.DocumentHelper;  
import org.dom4j.io.OutputFormat;  
import org.dom4j.io.XMLWriter;

代码:

String xml = "<your xml='here'/>";  
Document doc = DocumentHelper.parseText(xml);  
StringWriter sw = new StringWriter();  
OutputFormat format = OutputFormat.createPrettyPrint();  
XMLWriter xw = new XMLWriter(sw, format);  
xw.write(doc);  
String result = sw.toString();