我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

这是我自己问题的答案。我将各种结果的答案结合起来,编写了一个输出XML的类。

不保证它如何响应无效的XML或大型文档。

package ecb.sdw.pretty;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public XmlFormatter() {
    }

    public String format(String unformattedXml) {
        try {
            final Document document = parseXmlFile(unformattedXml);

            OutputFormat format = new OutputFormat(document);
            format.setLineWidth(65);
            format.setIndenting(true);
            format.setIndent(2);
            Writer out = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(out, format);
            serializer.serialize(document);

            return out.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private Document parseXmlFile(String in) {
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(in));
            return db.parse(is);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException(e);
        } catch (SAXException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }

}

其他回答

我在这里找到的针对Java 1.6+的解决方案不会重新格式化已经格式化的代码。对我有效的方法(重新格式化已经格式化的代码)如下。

import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;

public class XmlUtils {
    public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
        Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
        byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
        return new String(canonXmlBytes);
    }

    public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
        InputSource src = new InputSource(new StringReader(input));
        Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
        Boolean keepDeclaration = input.startsWith("<?xml");
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
        LSSerializer writer = impl.createLSSerializer();
        writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
        writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
        return writer.writeToString(document);
    }
}

它是在单元测试中比较全字符串xml的好工具。

private void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
    String canonicalExpected = prettyFormat(toCanonicalXml(expected));
    String canonicalActual = prettyFormat(toCanonicalXml(actual));
    assertEquals(canonicalExpected, canonicalActual);
}

请注意,排名靠前的答案需要使用xerces。

如果您不想添加这个外部依赖,那么您可以简单地使用标准jdk库(实际上是在内部使用xerces构建的)。

注意:jdk 1.5版本有一个bug,请参阅http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6296446,但现在已经解决了。

(注意,如果发生错误,将返回原始文本)

package com.test;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;

import org.xml.sax.InputSource;

public class XmlTest {
    public static void main(String[] args) {
        XmlTest t = new XmlTest();
        System.out.println(t.formatXml("<a><b><c/><d>text D</d><e value='0'/></b></a>"));
    }

    public String formatXml(String xml){
        try{
            Transformer serializer= SAXTransformerFactory.newInstance().newTransformer();
            serializer.setOutputProperty(OutputKeys.INDENT, "yes");
            //serializer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            //serializer.setOutputProperty("{http://xml.customer.org/xslt}indent-amount", "2");
            Source xmlSource=new SAXSource(new InputSource(new ByteArrayInputStream(xml.getBytes())));
            StreamResult res =  new StreamResult(new ByteArrayOutputStream());            
            serializer.transform(xmlSource, res);
            return new String(((ByteArrayOutputStream)res.getOutputStream()).toByteArray());
        }catch(Exception e){
            //TODO log error
            return xml;
        }
    }

}

在提出我自己的解决方案之前,我应该先看看这一页!不管怎样,我使用Java递归来解析xml页面。此代码是完全自包含的,不依赖于第三方库。也. .它使用递归!

// you call this method passing in the xml text
public static void prettyPrint(String text){
    prettyPrint(text, 0);
}

// "index" corresponds to the number of levels of nesting and/or the number of tabs to print before printing the tag
public static void prettyPrint(String xmlText, int index){
    boolean foundTagStart = false;
    StringBuilder tagChars = new StringBuilder();
    String startTag = "";
    String endTag = "";
    String[] chars = xmlText.split("");
    // find the next start tag
    for(String ch : chars){
        if(ch.equalsIgnoreCase("<")){
            tagChars.append(ch);
            foundTagStart = true;
        } else if(ch.equalsIgnoreCase(">") && foundTagStart){
            startTag = tagChars.append(ch).toString();
            String tempTag = startTag;
            endTag = (tempTag.contains("\"") ? (tempTag.split(" ")[0] + ">") : tempTag).replace("<", "</"); // <startTag attr1=1 attr2=2> => </startTag>
            break;
        } else if(foundTagStart){
            tagChars.append(ch);
        }
    }
    // once start and end tag are calculated, print start tag, then content, then end tag
    if(foundTagStart){
        int startIndex = xmlText.indexOf(startTag);
        int endIndex = xmlText.indexOf(endTag);
        // handle if matching tags NOT found
        if((startIndex < 0) || (endIndex < 0)){
            if(startIndex < 0) {
                // no start tag found
                return;
            } else {
                // start tag found, no end tag found (handles single tags aka "<mytag/>" or "<?xml ...>")
                printTabs(index);
                System.out.println(startTag);
                // move on to the next tag
                // NOTE: "index" (not index+1) because next tag is on same level as this one
                prettyPrint(xmlText.substring(startIndex+startTag.length(), xmlText.length()), index);
                return;
            }
        // handle when matching tags found
        } else {
            String content = xmlText.substring(startIndex+startTag.length(), endIndex);
            boolean isTagContainsTags = content.contains("<"); // content contains tags
            printTabs(index);
            if(isTagContainsTags){ // ie: <tag1><tag2>stuff</tag2></tag1>
                System.out.println(startTag);
                prettyPrint(content, index+1); // "index+1" because "content" is nested
                printTabs(index);
            } else {
                System.out.print(startTag); // ie: <tag1>stuff</tag1> or <tag1></tag1>
                System.out.print(content);
            }
            System.out.println(endTag);
            int nextIndex = endIndex + endTag.length();
            if(xmlText.length() > nextIndex){ // if there are more tags on this level, continue
                prettyPrint(xmlText.substring(nextIndex, xmlText.length()), index);
            }
        }
    } else {
        System.out.print(xmlText);
    }
}

private static void printTabs(int counter){
    while(counter-- > 0){ 
        System.out.print("\t");
    }
}

我把它们混合在一起,写了一个小程序。它从xml文件中读取并打印出来。而不是xzy给出你的文件路径。

    public static void main(String[] args) throws Exception {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(false);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(new FileInputStream(new File("C:/Users/xyz.xml")));
    prettyPrint(doc);

}

private static String prettyPrint(Document document)
        throws TransformerException {
    TransformerFactory transformerFactory = TransformerFactory
            .newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
    DOMSource source = new DOMSource(document);
    StringWriter strWriter = new StringWriter();
    StreamResult result = new StreamResult(strWriter);transformer.transform(source, result);
    System.out.println(strWriter.getBuffer().toString());

    return strWriter.getBuffer().toString();

}

这是我自己问题的答案。我将各种结果的答案结合起来,编写了一个输出XML的类。

不保证它如何响应无效的XML或大型文档。

package ecb.sdw.pretty;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public XmlFormatter() {
    }

    public String format(String unformattedXml) {
        try {
            final Document document = parseXmlFile(unformattedXml);

            OutputFormat format = new OutputFormat(document);
            format.setLineWidth(65);
            format.setIndenting(true);
            format.setIndent(2);
            Writer out = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(out, format);
            serializer.serialize(document);

            return out.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private Document parseXmlFile(String in) {
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(in));
            return db.parse(is);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException(e);
        } catch (SAXException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }

}