使用Python将XML转换为JSON ?

我在网上看到过相当多笨拙的XML->JSON代码，并与Stack的用户进行了一些互动，我相信这群人能比谷歌结果的前几页提供更多的帮助。

因此，我们正在解析一个天气提要，我们需要在许多网站上填充天气小部件。我们现在正在研究基于python的解决方案。

这个公共weather.com RSS提要是我们将要解析的内容的一个很好的例子(我们实际的weather.com提要包含额外的信息，因为与他们有合作关系)。

简而言之，如何使用Python将XML转换为JSON ?

当前回答

xmltodict(完全公开:是我写的)可以帮助您按照这个“标准”将XML转换为字典+列表+字符串结构。它是基于expat的，所以速度非常快，而且不需要在内存中加载整个XML树。

一旦你有了这个数据结构，你可以将它序列化为JSON:

import xmltodict, json

o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'

2012-04-18 01:06:05

其他回答

我的答案针对的是特定的(有点常见的)情况，在这种情况下，您实际上不需要将整个xml转换为json，但您需要遍历/访问xml的特定部分，并且您需要它是快速和简单的(使用json/dict-like操作)。

方法

为此，需要注意的是，使用lxml将xml解析为树非常快。大多数其他答案中最慢的部分是第二步:遍历树结构(通常在python领域)，将其转换为json。

这使我采用了我认为最适合这种情况的方法:使用lxml解析xml，然后(惰性地)包装树节点，为它们提供一个类似字典的接口。

Code

代码如下:

from collections import Mapping
import lxml.etree

class ETreeDictWrapper(Mapping):

    def __init__(self, elem, attr_prefix = '@', list_tags = ()):
        self.elem = elem
        self.attr_prefix = attr_prefix
        self.list_tags = list_tags

    def _wrap(self, e):
        if isinstance(e, basestring):
            return e
        if len(e) == 0 and len(e.attrib) == 0:
            return e.text
        return type(self)(
            e,
            attr_prefix = self.attr_prefix,
            list_tags = self.list_tags,
        )

    def __getitem__(self, key):
        if key.startswith(self.attr_prefix):
            return self.elem.attrib[key[len(self.attr_prefix):]]
        else:
            subelems = [ e for e in self.elem.iterchildren() if e.tag == key ]
            if len(subelems) > 1 or key in self.list_tags:
                return [ self._wrap(x) for x in subelems ]
            elif len(subelems) == 1:
                return self._wrap(subelems[0])
            else:
                raise KeyError(key)

    def __iter__(self):
        return iter(set( k.tag for k in self.elem) |
                    set( self.attr_prefix + k for k in self.elem.attrib ))

    def __len__(self):
        return len(self.elem) + len(self.elem.attrib)

    # defining __contains__ is not necessary, but improves speed
    def __contains__(self, key):
        if key.startswith(self.attr_prefix):
            return key[len(self.attr_prefix):] in self.elem.attrib
        else:
            return any( e.tag == key for e in self.elem.iterchildren() )


def xml_to_dictlike(xmlstr, attr_prefix = '@', list_tags = ()):
    t = lxml.etree.fromstring(xmlstr)
    return ETreeDictWrapper(
        t,
        attr_prefix = '@',
        list_tags = set(list_tags),
    )

这个实现是不完整的，例如，它不清楚地支持元素同时具有文本和属性，或者同时具有文本和子元素的情况(只是因为我在编写它时不需要它……)不过，改进它应该很容易。

速度

在我的特定用例中，我只需要处理xml的特定元素，与使用@Martin Blech的xmltodict然后直接遍历字典相比，这种方法提供了惊人的70倍加速(!)。

奖金

作为奖励，由于我们的结构已经是dict-like的，我们可以免费获得xml2json的另一种替代实现。我们只需要将dict-like结构传递给json.dumps。喜欢的东西:

def xml_to_json(xmlstr, **kwargs):
    x = xml_to_dictlike(xmlstr, **kwargs)
    return json.dumps(x)

如果你的xml包含属性，你需要使用一些字母数字attr_prefix(例如。"ATTR_")，以确保这些键是有效的json键。

我还没有对这部分进行基准测试。

2016-10-20 10:38:21

献给任何可能还需要这个的人。下面是一个更新的、简单的代码来进行这种转换。

from xml.etree import ElementTree as ET

xml    = ET.parse('FILE_NAME.xml')
parsed = parseXmlToJson(xml)


def parseXmlToJson(xml):
  response = {}

  for child in list(xml):
    if len(list(child)) > 0:
      response[child.tag] = parseXmlToJson(child)
    else:
      response[child.tag] = child.text or ''

    # one-liner equivalent
    # response[child.tag] = parseXmlToJson(child) if len(list(child)) > 0 else child.text or ''

  return response

2017-11-02 17:25:44

当我在python中处理XML时，我几乎总是使用lxml包。我怀疑大多数人都使用lxml。您可以使用xmltodict，但您将不得不再次解析XML。

用lxml将XML转换为json:

用lxml解析XML文档将lxml转换为dict 将列表转换为json

我在我的项目中使用下面的类。使用toJson方法。

from lxml import etree 
import json


class Element:
    '''
    Wrapper on the etree.Element class.  Extends functionality to output element
    as a dictionary.
    '''

    def __init__(self, element):
        '''
        :param: element a normal etree.Element instance
        '''
        self.element = element

    def toDict(self):
        '''
        Returns the element as a dictionary.  This includes all child elements.
        '''
        rval = {
            self.element.tag: {
                'attributes': dict(self.element.items()),
            },
        }
        for child in self.element:
            rval[self.element.tag].update(Element(child).toDict())
        return rval


class XmlDocument:
    '''
    Wraps lxml to provide:
        - cleaner access to some common lxml.etree functions
        - converter from XML to dict
        - converter from XML to json
    '''
    def __init__(self, xml = '<empty/>', filename=None):
        '''
        There are two ways to initialize the XmlDocument contents:
            - String
            - File

        You don't have to initialize the XmlDocument during instantiation
        though.  You can do it later with the 'set' method.  If you choose to
        initialize later XmlDocument will be initialized with "<empty/>".

        :param: xml Set this argument if you want to parse from a string.
        :param: filename Set this argument if you want to parse from a file.
        '''
        self.set(xml, filename) 

    def set(self, xml=None, filename=None):
        '''
        Use this to set or reset the contents of the XmlDocument.

        :param: xml Set this argument if you want to parse from a string.
        :param: filename Set this argument if you want to parse from a file.
        '''
        if filename is not None:
            self.tree = etree.parse(filename)
            self.root = self.tree.getroot()
        else:
            self.root = etree.fromstring(xml)
            self.tree = etree.ElementTree(self.root)


    def dump(self):
        etree.dump(self.root)

    def getXml(self):
        '''
        return document as a string
        '''
        return etree.tostring(self.root)

    def xpath(self, xpath):
        '''
        Return elements that match the given xpath.

        :param: xpath
        '''
        return self.tree.xpath(xpath);

    def nodes(self):
        '''
        Return all elements
        '''
        return self.root.iter('*')

    def toDict(self):
        '''
        Convert to a python dictionary
        '''
        return Element(self.root).toDict()

    def toJson(self, indent=None):
        '''
        Convert to JSON
        '''
        return json.dumps(self.toDict(), indent=indent)


if __name__ == "__main__":
    xml='''<system>
    <product>
        <demod>
            <frequency value='2.215' units='MHz'>
                <blah value='1'/>
            </frequency>
        </demod>
    </product>
</system>
'''
    doc = XmlDocument(xml)
    print doc.toJson(indent=4)

内置main的输出是:

{
    "system": {
        "attributes": {}, 
        "product": {
            "attributes": {}, 
            "demod": {
                "attributes": {}, 
                "frequency": {
                    "attributes": {
                        "units": "MHz", 
                        "value": "2.215"
                    }, 
                    "blah": {
                        "attributes": {
                            "value": "1"
                        }
                    }
                }
            }
        }
    }
}

它是xml的一个转换:

<system>
    <product>
        <demod>
            <frequency value='2.215' units='MHz'>
                <blah value='1'/>
            </frequency>
        </demod>
    </product>
</system>

2017-05-09 16:30:31

我不久前在github上发表了一篇文章。

https://github.com/davlee1972/xml_to_json

这个转换器是用Python编写的，将一个或多个XML文件转换为JSON / JSONL文件

它需要一个XSD模式文件来找出嵌套的json结构(字典vs列表)和json等效数据类型。

python xml_to_json.py -x PurchaseOrder.xsd PurchaseOrder.xml

INFO - 2018-03-20 11:10:24 - Parsing XML Files..
INFO - 2018-03-20 11:10:24 - Processing 1 files
INFO - 2018-03-20 11:10:24 - Parsing files in the following order:
INFO - 2018-03-20 11:10:24 - ['PurchaseOrder.xml']
DEBUG - 2018-03-20 11:10:24 - Generating schema from PurchaseOrder.xsd
DEBUG - 2018-03-20 11:10:24 - Parsing PurchaseOrder.xml
DEBUG - 2018-03-20 11:10:24 - Writing to file PurchaseOrder.json
DEBUG - 2018-03-20 11:10:24 - Completed PurchaseOrder.xml

我也有一个后续的xml到拼花转换器，以类似的方式工作

https://github.com/blackrock/xml_to_parquet

2021-02-23 20:45:22

如果您不想使用任何外部库和第三方工具，请尝试下面的代码。

Code

import re
import json

def getdict(content):
    res=re.findall("<(?P<var>\S*)(?P<attr>[^/>]*)(?:(?:>(?P<val>.*?)</(?P=var)>)|(?:/>))",content)
    if len(res)>=1:
        attreg="(?P<avr>\S+?)(?:(?:=(?P<quote>['\"])(?P<avl>.*?)(?P=quote))|(?:=(?P<avl1>.*?)(?:\s|$))|(?P<avl2>[\s]+)|$)"
        if len(res)>1:
            return [{i[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,i[1].strip())]},{"$values":getdict(i[2])}]} for i in res]
        else:
            return {res[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,res[1].strip())]},{"$values":getdict(res[2])}]}
    else:
        return content

with open("test.xml","r") as f:
    print(json.dumps(getdict(f.read().replace('\n',''))))

样例输入

<details class="4b" count=1 boy>
    <name type="firstname">John</name>
    <age>13</age>
    <hobby>Coin collection</hobby>
    <hobby>Stamp collection</hobby>
    <address>
        <country>USA</country>
        <state>CA</state>
    </address>
</details>
<details empty="True"/>
<details/>
<details class="4a" count=2 girl>
    <name type="firstname">Samantha</name>
    <age>13</age>
    <hobby>Fishing</hobby>
    <hobby>Chess</hobby>
    <address current="no">
        <country>Australia</country>
        <state>NSW</state>
    </address>
</details>

输出

[
  {
    "details": [
      {
        "@attributes": [
          {
            "class": "4b"
          },
          {
            "count": "1"
          },
          {
            "boy": ""
          }
        ]
      },
      {
        "$values": [
          {
            "name": [
              {
                "@attributes": [
                  {
                    "type": "firstname"
                  }
                ]
              },
              {
                "$values": "John"
              }
            ]
          },
          {
            "age": [
              {
                "@attributes": []
              },
              {
                "$values": "13"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Coin collection"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Stamp collection"
              }
            ]
          },
          {
            "address": [
              {
                "@attributes": []
              },
              {
                "$values": [
                  {
                    "country": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "USA"
                      }
                    ]
                  },
                  {
                    "state": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "CA"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": [
          {
            "empty": "True"
          }
        ]
      },
      {
        "$values": ""
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": []
      },
      {
        "$values": ""
      }
    ]
  },
  {
    "details": [
      {
        "@attributes": [
          {
            "class": "4a"
          },
          {
            "count": "2"
          },
          {
            "girl": ""
          }
        ]
      },
      {
        "$values": [
          {
            "name": [
              {
                "@attributes": [
                  {
                    "type": "firstname"
                  }
                ]
              },
              {
                "$values": "Samantha"
              }
            ]
          },
          {
            "age": [
              {
                "@attributes": []
              },
              {
                "$values": "13"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Fishing"
              }
            ]
          },
          {
            "hobby": [
              {
                "@attributes": []
              },
              {
                "$values": "Chess"
              }
            ]
          },
          {
            "address": [
              {
                "@attributes": [
                  {
                    "current": "no"
                  }
                ]
              },
              {
                "$values": [
                  {
                    "country": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "Australia"
                      }
                    ]
                  },
                  {
                    "state": [
                      {
                        "@attributes": []
                      },
                      {
                        "$values": "NSW"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
]

2020-08-24 07:09:54

使用Python将XML转换为JSON ?

推荐文章

最新文章

标签