我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

在提出我自己的解决方案之前,我应该先看看这一页!不管怎样,我使用Java递归来解析xml页面。此代码是完全自包含的,不依赖于第三方库。也. .它使用递归!

// you call this method passing in the xml text
public static void prettyPrint(String text){
    prettyPrint(text, 0);
}

// "index" corresponds to the number of levels of nesting and/or the number of tabs to print before printing the tag
public static void prettyPrint(String xmlText, int index){
    boolean foundTagStart = false;
    StringBuilder tagChars = new StringBuilder();
    String startTag = "";
    String endTag = "";
    String[] chars = xmlText.split("");
    // find the next start tag
    for(String ch : chars){
        if(ch.equalsIgnoreCase("<")){
            tagChars.append(ch);
            foundTagStart = true;
        } else if(ch.equalsIgnoreCase(">") && foundTagStart){
            startTag = tagChars.append(ch).toString();
            String tempTag = startTag;
            endTag = (tempTag.contains("\"") ? (tempTag.split(" ")[0] + ">") : tempTag).replace("<", "</"); // <startTag attr1=1 attr2=2> => </startTag>
            break;
        } else if(foundTagStart){
            tagChars.append(ch);
        }
    }
    // once start and end tag are calculated, print start tag, then content, then end tag
    if(foundTagStart){
        int startIndex = xmlText.indexOf(startTag);
        int endIndex = xmlText.indexOf(endTag);
        // handle if matching tags NOT found
        if((startIndex < 0) || (endIndex < 0)){
            if(startIndex < 0) {
                // no start tag found
                return;
            } else {
                // start tag found, no end tag found (handles single tags aka "<mytag/>" or "<?xml ...>")
                printTabs(index);
                System.out.println(startTag);
                // move on to the next tag
                // NOTE: "index" (not index+1) because next tag is on same level as this one
                prettyPrint(xmlText.substring(startIndex+startTag.length(), xmlText.length()), index);
                return;
            }
        // handle when matching tags found
        } else {
            String content = xmlText.substring(startIndex+startTag.length(), endIndex);
            boolean isTagContainsTags = content.contains("<"); // content contains tags
            printTabs(index);
            if(isTagContainsTags){ // ie: <tag1><tag2>stuff</tag2></tag1>
                System.out.println(startTag);
                prettyPrint(content, index+1); // "index+1" because "content" is nested
                printTabs(index);
            } else {
                System.out.print(startTag); // ie: <tag1>stuff</tag1> or <tag1></tag1>
                System.out.print(content);
            }
            System.out.println(endTag);
            int nextIndex = endIndex + endTag.length();
            if(xmlText.length() > nextIndex){ // if there are more tags on this level, continue
                prettyPrint(xmlText.substring(nextIndex, xmlText.length()), index);
            }
        }
    } else {
        System.out.print(xmlText);
    }
}

private static void printTabs(int counter){
    while(counter-- > 0){ 
        System.out.print("\t");
    }
}

其他回答

java有一个静态方法U.formatXml(string)。生活的例子

import com.github.underscore.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<tag><nested>hello</nested></tag>";

        System.out.println(U.formatXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>" + xml + "</root>"));
    }
}

输出:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <tag>
      <nested>hello</nested>
   </tag>
</root>

下面是一种使用dom4j的方法:

进口:

import org.dom4j.Document;  
import org.dom4j.DocumentHelper;  
import org.dom4j.io.OutputFormat;  
import org.dom4j.io.XMLWriter;

代码:

String xml = "<your xml='here'/>";  
Document doc = DocumentHelper.parseText(xml);  
StringWriter sw = new StringWriter();  
OutputFormat format = OutputFormat.createPrettyPrint();  
XMLWriter xw = new XMLWriter(sw, format);  
xw.write(doc);  
String result = sw.toString();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
// initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);

注意:根据Java版本的不同,结果可能有所不同。搜索特定于您的平台的解决方案。

I have found that in Java 1.6.0_32 the normal method to pretty print an XML string (using a Transformer with a null or identity xslt) does not behave as I would like if tags are merely separated by whitespace, as opposed to having no separating text. I tried using <xsl:strip-space elements="*"/> in my template to no avail. The simplest solution I found was to strip the space the way I wanted using a SAXSource and XML filter. Since my solution was for logging I also extended this to work with incomplete XML fragments. Note the normal method seems to work fine if you use a DOMSource but I did not want to use this because of the incompleteness and memory overhead.

public static class WhitespaceIgnoreFilter extends XMLFilterImpl
{

    @Override
    public void ignorableWhitespace(char[] arg0,
                                    int arg1,
                                    int arg2) throws SAXException
    {
        //Ignore it then...
    }

    @Override
    public void characters( char[] ch,
                            int start,
                            int length) throws SAXException
    {
        if (!new String(ch, start, length).trim().equals("")) 
               super.characters(ch, start, length); 
    }
}

public static String prettyXML(String logMsg, boolean allowBadlyFormedFragments) throws SAXException, IOException, TransformerException
    {
        TransformerFactory transFactory = TransformerFactory.newInstance();
        transFactory.setAttribute("indent-number", new Integer(2));
        Transformer transformer = transFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        StringWriter out = new StringWriter();
        XMLReader masterParser = SAXHelper.getSAXParser(true);
        XMLFilter parser = new WhitespaceIgnoreFilter();
        parser.setParent(masterParser);

        if(allowBadlyFormedFragments)
        {
            transformer.setErrorListener(new ErrorListener()
            {
                @Override
                public void warning(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void fatalError(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void error(TransformerException exception) throws TransformerException
                {
                }
            });
        }

        try
        {
            transformer.transform(new SAXSource(parser, new InputSource(new StringReader(logMsg))), new StreamResult(out));
        }
        catch (TransformerException e)
        {
            if(e.getCause() != null && e.getCause() instanceof SAXParseException)
            {
                if(!allowBadlyFormedFragments || !"XML document structures must start and end within the same entity.".equals(e.getCause().getMessage()))
                {
                    throw e;
                }
            }
            else
            {
                throw e;
            }
        }
        out.flush();
        return out.toString();
    }

我把它们混合在一起,写了一个小程序。它从xml文件中读取并打印出来。而不是xzy给出你的文件路径。

    public static void main(String[] args) throws Exception {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setValidating(false);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(new FileInputStream(new File("C:/Users/xyz.xml")));
    prettyPrint(doc);

}

private static String prettyPrint(Document document)
        throws TransformerException {
    TransformerFactory transformerFactory = TransformerFactory
            .newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
    DOMSource source = new DOMSource(document);
    StringWriter strWriter = new StringWriter();
    StreamResult result = new StreamResult(strWriter);transformer.transform(source, result);
    System.out.println(strWriter.getBuffer().toString());

    return strWriter.getBuffer().toString();

}