用Python打印XML的最佳方法(或各种方法)是什么?


当前回答

使用etree。缩进和etree.tostring

import lxml.etree as etree

root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space="  ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)

输出

<html>
  <head/>
  <body>
    <h1>Welcome</h1>
  </body>
</html>

删除名称空间和前缀

import lxml.etree as etree


def dump_xml(element):
    for item in element.getiterator():
        item.tag = etree.QName(item).localname

    etree.cleanup_namespaces(element)
    etree.indent(element, space="  ")
    result = etree.tostring(element, pretty_print=True).decode()
    return result


root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)

输出

<document>
  <name>hello world</name>
</document>

其他回答

正如其他人指出的那样,lxml内置了一个漂亮的打印机。

请注意,在默认情况下,它会将CDATA部分更改为普通文本,这可能会产生糟糕的结果。

下面是一个Python函数,它保留输入文件,只改变缩进(注意strip_cdata=False)。此外,它确保输出使用UTF-8作为编码,而不是默认的ASCII(注意encoding=' UTF-8 '):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

使用示例:

prettyPrintXml('some_folder/some_file.xml')

使用etree。缩进和etree.tostring

import lxml.etree as etree

root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space="  ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)

输出

<html>
  <head/>
  <body>
    <h1>Welcome</h1>
  </body>
</html>

删除名称空间和前缀

import lxml.etree as etree


def dump_xml(element):
    for item in element.getiterator():
        item.tag = etree.QName(item).localname

    etree.cleanup_namespaces(element)
    etree.indent(element, space="  ")
    result = etree.tostring(element, pretty_print=True).decode()
    return result


root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)

输出

<document>
  <name>hello world</name>
</document>
from yattag import indent

pretty_string = indent(ugly_string)

它不会在文本节点中添加空格或换行,除非你要求它:

indent(mystring, indent_text = True)

您可以指定缩进单位和换行符的样式。

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

该文件在http://www.yattag.org主页上。

从Python 3.9开始,ElementTree有一个用于漂亮打印XML树的indent()函数。

见https://docs.python.org/3/library/xml.etree.elementtree.html # xml.etree.ElementTree.indent。

示例用法:

import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))

好处是它不需要任何额外的库。欲了解更多信息,请访问https://bugs.python.org/issue14465和https://github.com/python/cpython/pull/15200

另一个解决方案是借用这个缩进函数,用于自2.5以来内置在Python中的ElementTree库。 下面是它的样子:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)