我在XML中有很多行,我试图获得一个特定节点属性的实例。
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
我如何访问属性foobar的值?在这个例子中,我想要“1”和“2”。
我在XML中有很多行,我试图获得一个特定节点属性的实例。
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
我如何访问属性foobar的值?在这个例子中,我想要“1”和“2”。
当前回答
有很多选择。如果速度和内存使用是一个问题,cElementTree看起来很棒。与简单地使用readline读取文件相比,它的开销非常小。
相关指标可以在下表中找到,复制自cElementTree网站:
library time space
xml.dom.minidom (Python 2.1) 6.3 s 80000K
gnosis.objectify 2.0 s 22000k
xml.dom.minidom (Python 2.4) 1.4 s 53000k
ElementTree 1.2 1.6 s 14500k
ElementTree 1.2.4/1.3 1.1 s 14500k
cDomlette (C extension) 0.540 s 20500k
PyRXPU (C extension) 0.175 s 10850k
libxml2 (C extension) 0.098 s 16000k
readlines (read as utf-8) 0.093 s 8850k
cElementTree (C extension) --> 0.047 s 4900K <--
readlines (read as ascii) 0.032 s 5050k
正如@jfs所指出的,cElementTree是与Python捆绑在一起的:
Python 2:来自xml。etree导入cElementTree作为ElementTree。 Python 3:从xml。导入ElementTree(自动使用加速的C版本)。
其他回答
如果源文件是一个xml文件,就像这个示例一样
<pa:Process xmlns:pa="http://sssss">
<pa:firsttag>SAMPLE</pa:firsttag>
</pa:Process>
您可以尝试下面的代码
from lxml import etree, objectify
metadata = 'C:\\Users\\PROCS.xml' # this is sample xml file the contents are shown above
parser = etree.XMLParser(remove_blank_text=True) # this line removes the name space from the xml in this sample the name space is --> http://sssss
tree = etree.parse(metadata, parser) # this line parses the xml file which is PROCS.xml
root = tree.getroot() # we get the root of xml which is process and iterate using a for loop
for elem in root.getiterator():
if not hasattr(elem.tag, 'find'): continue # (1)
i = elem.tag.find('}')
if i >= 0:
elem.tag = elem.tag[i+1:]
dict={} # a python dictionary is declared
for elem in tree.iter(): #iterating through the xml tree using a for loop
if elem.tag =="firsttag": # if the tag name matches the name that is equated then the text in the tag is stored into the dictionary
dict["FIRST_TAG"]=str(elem.text)
print(dict)
输出将是
{'FIRST_TAG': 'SAMPLE'}
我推荐ElementTree。同样的API还有其他兼容的实现,比如lxml和Python标准库中的cElementTree;但是,在这种情况下,他们主要增加的是更快的速度——编程的容易程度取决于ElementTree定义的API。
首先从XML中构建一个Element实例根,例如使用XML函数,或者通过解析文件,例如:
import xml.etree.ElementTree as ET
root = ET.parse('thefile.xml').getroot()
或者在ElementTree中显示的许多其他方法中的任何一种。然后这样做:
for type_tag in root.findall('bar/type'):
value = type_tag.get('foobar')
print(value)
输出:
1
2
如果您不想使用任何外部库或第三方工具,请尝试下面的代码。
这将把xml解析成python字典 这也将解析xml属性 这也将解析空标签,如<tag/>和只有属性的标签,如<tag var=val/>
Code
import re
def getdict(content):
res=re.findall("<(?P<var>\S*)(?P<attr>[^/>]*)(?:(?:>(?P<val>.*?)</(?P=var)>)|(?:/>))",content)
if len(res)>=1:
attreg="(?P<avr>\S+?)(?:(?:=(?P<quote>['\"])(?P<avl>.*?)(?P=quote))|(?:=(?P<avl1>.*?)(?:\s|$))|(?P<avl2>[\s]+)|$)"
if len(res)>1:
return [{i[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,i[1].strip())]},{"$values":getdict(i[2])}]} for i in res]
else:
return {res[0]:[{"@attributes":[{j[0]:(j[2] or j[3] or j[4])} for j in re.findall(attreg,res[1].strip())]},{"$values":getdict(res[2])}]}
else:
return content
with open("test.xml","r") as f:
print(getdict(f.read().replace('\n','')))
样例输入
<details class="4b" count=1 boy>
<name type="firstname">John</name>
<age>13</age>
<hobby>Coin collection</hobby>
<hobby>Stamp collection</hobby>
<address>
<country>USA</country>
<state>CA</state>
</address>
</details>
<details empty="True"/>
<details/>
<details class="4a" count=2 girl>
<name type="firstname">Samantha</name>
<age>13</age>
<hobby>Fishing</hobby>
<hobby>Chess</hobby>
<address current="no">
<country>Australia</country>
<state>NSW</state>
</address>
</details>
输出(美化)
[
{
"details": [
{
"@attributes": [
{
"class": "4b"
},
{
"count": "1"
},
{
"boy": ""
}
]
},
{
"$values": [
{
"name": [
{
"@attributes": [
{
"type": "firstname"
}
]
},
{
"$values": "John"
}
]
},
{
"age": [
{
"@attributes": []
},
{
"$values": "13"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Coin collection"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Stamp collection"
}
]
},
{
"address": [
{
"@attributes": []
},
{
"$values": [
{
"country": [
{
"@attributes": []
},
{
"$values": "USA"
}
]
},
{
"state": [
{
"@attributes": []
},
{
"$values": "CA"
}
]
}
]
}
]
}
]
}
]
},
{
"details": [
{
"@attributes": [
{
"empty": "True"
}
]
},
{
"$values": ""
}
]
},
{
"details": [
{
"@attributes": []
},
{
"$values": ""
}
]
},
{
"details": [
{
"@attributes": [
{
"class": "4a"
},
{
"count": "2"
},
{
"girl": ""
}
]
},
{
"$values": [
{
"name": [
{
"@attributes": [
{
"type": "firstname"
}
]
},
{
"$values": "Samantha"
}
]
},
{
"age": [
{
"@attributes": []
},
{
"$values": "13"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Fishing"
}
]
},
{
"hobby": [
{
"@attributes": []
},
{
"$values": "Chess"
}
]
},
{
"address": [
{
"@attributes": [
{
"current": "no"
}
]
},
{
"$values": [
{
"country": [
{
"@attributes": []
},
{
"$values": "Australia"
}
]
},
{
"state": [
{
"@attributes": []
},
{
"$values": "NSW"
}
]
}
]
}
]
}
]
}
]
}
]
这里有一个使用cElementTree的非常简单但有效的代码。
try:
import cElementTree as ET
except ImportError:
try:
# Python 2.5 need to import a different module
import xml.etree.cElementTree as ET
except ImportError:
exit_err("Failed to import cElementTree from any known place")
def find_in_tree(tree, node):
found = tree.find(node)
if found == None:
print "No %s in file" % node
found = []
return found
# Parse a xml file (specify the path)
def_file = "xml_file_name.xml"
try:
dom = ET.parse(open(def_file, "r"))
root = dom.getroot()
except:
exit_err("Unable to open and parse input definition file: " + def_file)
# Parse to find the child nodes list of node 'myNode'
fwdefs = find_in_tree(root,"myNode")
这是来自“python xml解析”。
#If the xml is in the form of a string as shown below then
from lxml import etree, objectify
'''sample xml as a string with a name space {http://xmlns.abc.com}'''
message =b'<?xml version="1.0" encoding="UTF-8"?>\r\n<pa:Process xmlns:pa="http://xmlns.abc.com">\r\n\t<pa:firsttag>SAMPLE</pa:firsttag></pa:Process>\r\n' # this is a sample xml which is a string
print('************message coversion and parsing starts*************')
message=message.decode('utf-8')
message=message.replace('<?xml version="1.0" encoding="UTF-8"?>\r\n','') #replace is used to remove unwanted strings from the 'message'
message=message.replace('pa:Process>\r\n','pa:Process>')
print (message)
print ('******Parsing starts*************')
parser = etree.XMLParser(remove_blank_text=True) #the name space is removed here
root = etree.fromstring(message, parser) #parsing of xml happens here
print ('******Parsing completed************')
dict={}
for child in root: # parsed xml is iterated using a for loop and values are stored in a dictionary
print(child.tag,child.text)
print('****Derving from xml tree*****')
if child.tag =="{http://xmlns.abc.com}firsttag":
dict["FIRST_TAG"]=child.text
print(dict)
### output
'''************message coversion and parsing starts*************
<pa:Process xmlns:pa="http://xmlns.abc.com">
<pa:firsttag>SAMPLE</pa:firsttag></pa:Process>
******Parsing starts*************
******Parsing completed************
{http://xmlns.abc.com}firsttag SAMPLE
****Derving from xml tree*****
{'FIRST_TAG': 'SAMPLE'}'''