我在XML中有很多行,我试图获得一个特定节点属性的实例。
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
我如何访问属性foobar的值?在这个例子中,我想要“1”和“2”。
我在XML中有很多行,我试图获得一个特定节点属性的实例。
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
我如何访问属性foobar的值?在这个例子中,我想要“1”和“2”。
当前回答
lxml。物化真的很简单。
以示例文本为例:
from lxml import objectify
from collections import defaultdict
count = defaultdict(int)
root = objectify.fromstring(text)
for item in root.bar.type:
count[item.attrib.get("foobar")] += 1
print dict(count)
输出:
{'1': 1, '2': 1}
其他回答
lxml。物化真的很简单。
以示例文本为例:
from lxml import objectify
from collections import defaultdict
count = defaultdict(int)
root = objectify.fromstring(text)
for item in root.bar.type:
count[item.attrib.get("foobar")] += 1
print dict(count)
输出:
{'1': 1, '2': 1}
Python有一个到expat XML解析器的接口。
xml.parsers.expat
它是一个非验证解析器,因此不会捕获糟糕的XML。但如果你知道你的文件是正确的,那么这就很好了,你可能会得到你想要的确切信息,你可以丢弃其余的。
stringofxml = """<foo>
<bar>
<type arg="value" />
<type arg="value" />
<type arg="value" />
</bar>
<bar>
<type arg="value" />
</bar>
</foo>"""
count = 0
def start(name, attr):
global count
if name == 'type':
count += 1
p = expat.ParserCreate()
p.StartElementHandler = start
p.Parse(stringofxml)
print count # prints 4
如果您不想使用任何外部库或第三方工具,请尝试下面的代码。
这将把xml解析成python字典 这也将解析xml属性 这也将解析空标签,如<tag/>和只有属性的标签,如<tag var=val/>
Code
import re
def getdict(content):
res=re.findall("<(?P<var>\S*)(?P<attr>[^/>]*)(?:(?:>(?P<val>.*?)</(?P=var)>)|(?:/>))",content)
if len(res)>=1:
attreg="(?P<avr>\S+?)(?:(?:=(?P<quote>['\"])(?P<avl>.*?)(?P=quote))|(?:=(?P<avl1>.*?)(?:\s|$))|(?P<avl2>[\s]+)|$)"
if len(res)>1:
return [{i[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,i[1].strip())]},{"$values":getdict(i[2])}]} for i in res]
else:
return {res[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,res[1].strip())]},{"$values":getdict(res[2])}]}
else:
return content
with open("test.xml","r") as f:
print(getdict(f.read().replace('\n','')))
样例输入
<details class="4b" count=1 boy>
<name type="firstname">John</name>
<age>13</age>
<hobby>Coin collection</hobby>
<hobby>Stamp collection</hobby>
<address>
<country>USA</country>
<state>CA</state>
</address>
</details>
<details empty="True"/>
<details/>
<details class="4a" count=2 girl>
<name type="firstname">Samantha</name>
<age>13</age>
<hobby>Fishing</hobby>
<hobby>Chess</hobby>
<address current="no">
<country>Australia</country>
<state>NSW</state>
</address>
</details>
输出(美化)
[
{
"details": [
{
"@attributes": [
{
"class": "4b"
},
{
"count": "1"
},
{
"boy": ""
}
]
},
{
"$values": [
{
"name": [
{
"@attributes": [
{
"type": "firstname"
}
]
},
{
"$values": "John"
}
]
},
{
"age": [
{
"@attributes": []
},
{
"$values": "13"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Coin collection"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Stamp collection"
}
]
},
{
"address": [
{
"@attributes": []
},
{
"$values": [
{
"country": [
{
"@attributes": []
},
{
"$values": "USA"
}
]
},
{
"state": [
{
"@attributes": []
},
{
"$values": "CA"
}
]
}
]
}
]
}
]
}
]
},
{
"details": [
{
"@attributes": [
{
"empty": "True"
}
]
},
{
"$values": ""
}
]
},
{
"details": [
{
"@attributes": []
},
{
"$values": ""
}
]
},
{
"details": [
{
"@attributes": [
{
"class": "4a"
},
{
"count": "2"
},
{
"girl": ""
}
]
},
{
"$values": [
{
"name": [
{
"@attributes": [
{
"type": "firstname"
}
]
},
{
"$values": "Samantha"
}
]
},
{
"age": [
{
"@attributes": []
},
{
"$values": "13"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Fishing"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Chess"
}
]
},
{
"address": [
{
"@attributes": [
{
"current": "no"
}
]
},
{
"$values": [
{
"country": [
{
"@attributes": []
},
{
"$values": "Australia"
}
]
},
{
"state": [
{
"@attributes": []
},
{
"$values": "NSW"
}
]
}
]
}
]
}
]
}
]
}
]
如果源文件是一个xml文件,就像这个示例一样
<pa:Process xmlns:pa="http://sssss">
<pa:firsttag>SAMPLE</pa:firsttag>
</pa:Process>
您可以尝试下面的代码
from lxml import etree, objectify
metadata = 'C:\\Users\\PROCS.xml' # this is sample xml file the contents are shown above
parser = etree.XMLParser(remove_blank_text=True) # this line removes the name space from the xml in this sample the name space is --> http://sssss
tree = etree.parse(metadata, parser) # this line parses the xml file which is PROCS.xml
root = tree.getroot() # we get the root of xml which is process and iterate using a for loop
for elem in root.getiterator():
if not hasattr(elem.tag, 'find'): continue # (1)
i = elem.tag.find('}')
if i >= 0:
elem.tag = elem.tag[i+1:]
dict={} # a python dictionary is declared
for elem in tree.iter(): #iterating through the xml tree using a for loop
if elem.tag =="firsttag": # if the tag name matches the name that is equated then the text in the tag is stored into the dictionary
dict["FIRST_TAG"]=str(elem.text)
print(dict)
输出将是
{'FIRST_TAG': 'SAMPLE'}
Minidom是最快速且非常直接的方法。
XML:
<data>
<items>
<item name="item1"></item>
<item name="item2"></item>
<item name="item3"></item>
<item name="item4"></item>
</items>
</data>
Python:
from xml.dom import minidom
dom = minidom.parse('items.xml')
elements = dom.getElementsByTagName('item')
print(f"There are {len(elements)} items:")
for element in elements:
print(element.attributes['name'].value)
输出:
There are 4 items:
item1
item2
item3
item4