我在XML中有很多行,我试图获得一个特定节点属性的实例。

<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>

我如何访问属性foobar的值?在这个例子中,我想要“1”和“2”。


当前回答

Minidom是最快速且非常直接的方法。

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
</data>

Python:

from xml.dom import minidom

dom = minidom.parse('items.xml')
elements = dom.getElementsByTagName('item')

print(f"There are {len(elements)} items:")

for element in elements:
    print(element.attributes['name'].value)

输出:

There are 4 items:
item1
item2
item3
item4

其他回答

如果源文件是一个xml文件,就像这个示例一样

<pa:Process xmlns:pa="http://sssss">
        <pa:firsttag>SAMPLE</pa:firsttag>
    </pa:Process>

您可以尝试下面的代码

from lxml import etree, objectify
metadata = 'C:\\Users\\PROCS.xml' # this is sample xml file the contents are shown above
parser = etree.XMLParser(remove_blank_text=True) # this line removes the  name space from the xml in this sample the name space is --> http://sssss
tree = etree.parse(metadata, parser) # this line parses the xml file which is PROCS.xml
root = tree.getroot() # we get the root of xml which is process and iterate using a for loop
for elem in root.getiterator():
    if not hasattr(elem.tag, 'find'): continue  # (1)
    i = elem.tag.find('}')
    if i >= 0:
        elem.tag = elem.tag[i+1:]

dict={}  # a python dictionary is declared
for elem in tree.iter(): #iterating through the xml tree using a for loop
    if elem.tag =="firsttag": # if the tag name matches the name that is equated then the text in the tag is stored into the dictionary
        dict["FIRST_TAG"]=str(elem.text)
        print(dict)

输出将是

{'FIRST_TAG': 'SAMPLE'}

Minidom是最快速且非常直接的方法。

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
</data>

Python:

from xml.dom import minidom

dom = minidom.parse('items.xml')
elements = dom.getElementsByTagName('item')

print(f"There are {len(elements)} items:")

for element in elements:
    print(element.attributes['name'].value)

输出:

There are 4 items:
item1
item2
item3
item4

有很多选择。如果速度和内存使用是一个问题,cElementTree看起来很棒。与简单地使用readline读取文件相比,它的开销非常小。

相关指标可以在下表中找到,复制自cElementTree网站:

library                         time    space
xml.dom.minidom (Python 2.1)    6.3 s   80000K
gnosis.objectify                2.0 s   22000k
xml.dom.minidom (Python 2.4)    1.4 s   53000k
ElementTree 1.2                 1.6 s   14500k  
ElementTree 1.2.4/1.3           1.1 s   14500k  
cDomlette (C extension)         0.540 s 20500k
PyRXPU (C extension)            0.175 s 10850k
libxml2 (C extension)           0.098 s 16000k
readlines (read as utf-8)       0.093 s 8850k
cElementTree (C extension)  --> 0.047 s 4900K <--
readlines (read as ascii)       0.032 s 5050k   

正如@jfs所指出的,cElementTree是与Python捆绑在一起的:

Python 2:来自xml。etree导入cElementTree作为ElementTree。 Python 3:从xml。导入ElementTree(自动使用加速的C版本)。

XML:

<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>

Python代码:

import xml.etree.cElementTree as ET

tree = ET.parse("foo.xml")
root = tree.getroot() 
root_tag = root.tag
print(root_tag) 

for form in root.findall("./bar/type"):
    x=(form.attrib)
    z=list(x)
    for i in z:
        print(x[i])

输出:

foo
1
2

你可以使用BeautifulSoup:

from bs4 import BeautifulSoup

x="""<foo>
   <bar>
      <type foobar="1"/>
      <type foobar="2"/>
   </bar>
</foo>"""

y=BeautifulSoup(x)
>>> y.foo.bar.type["foobar"]
u'1'

>>> y.foo.bar.findAll("type")
[<type foobar="1"></type>, <type foobar="2"></type>]

>>> y.foo.bar.findAll("type")[0]["foobar"]
u'1'
>>> y.foo.bar.findAll("type")[1]["foobar"]
u'2'