我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

有一个非常好的命令行XML实用程序叫做xmlstarlet(http://xmlstar.sourceforge.net/),它可以做很多事情,很多人都在使用它。

您可以使用Runtime以编程方式执行此程序。然后读入格式化的输出文件。它具有比几行Java代码所能提供的更多选项和更好的错误报告。

下载xmlstarlet: http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589

其他回答

Since you are starting with a String, you can convert to a DOM object (e.g. Node) before you use the Transformer. However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every </...> characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every </...> you see.

免责声明-我对下面的函数做了剪切/粘贴/文本编辑,所以它们可能不能按原样编译。

public static final Element createDOM(String strXML) 
    throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    InputSource sourceXML = new InputSource(new StringReader(strXML));
    Document xmlDoc = db.parse(sourceXML);
    Element e = xmlDoc.getDocumentElement();
    e.normalize();
    return e;
}

public static final void prettyPrint(Node xml, OutputStream out)
    throws TransformerConfigurationException, TransformerFactoryConfigurationError, TransformerException {
    Transformer tf = TransformerFactory.newInstance().newTransformer();
    tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    tf.setOutputProperty(OutputKeys.INDENT, "yes");
    tf.transform(new DOMSource(xml), new StreamResult(out));
}

使用scala:

import xml._
val xml = XML.loadString("<tag><nested>hello</nested></tag>")
val formatted = new PrettyPrinter(150, 2).format(xml)
println(formatted)

如果你依赖scala-library.jar,你也可以在Java中这样做。它是这样的:

import scala.xml.*;

public class FormatXML {
    public static void main(String[] args) {
        String unformattedXml = "<tag><nested>hello</nested></tag>";
        PrettyPrinter pp = new PrettyPrinter(150, 3);
        String formatted = pp.format(XML.loadString(unformattedXml), TopScope$.MODULE$);
        System.out.println(formatted);
    }
}

PrettyPrinter对象是用两个整数构造的,第一个是最大行长,第二个是缩进步骤。

关于“您必须首先构建DOM树”的评论:不,您不需要也不应该这样做。

相反,创建一个StreamSource(new StreamSource(new StringReader(str)),并将其提供给前面提到的标识转换器。这将使用SAX解析器,结果将快得多。 在这种情况下,构建中间树纯粹是开销。 否则,排名第一的答案是好的。

使用jdom2: http://www.jdom.org/

import java.io.StringReader;
import org.jdom2.input.SAXBuilder;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;

String prettyXml = new XMLOutputter(Format.getPrettyFormat()).
                         outputString(new SAXBuilder().build(new StringReader(uglyXml)));
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
// initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);

注意:根据Java版本的不同,结果可能有所不同。搜索特定于您的平台的解决方案。