是否有一个针对Ubuntu和/或CentOS的包,它有一个命令行工具,可以执行像foo //element@attribute filename.xml或foo //element@attribute < filename.xml这样的XPath一行程序,并逐行返回结果?
我正在寻找一些东西,这将允许我只是apt-get安装foo或yum安装foo,然后只是开箱即用,没有包装或其他必要的适应。
以下是一些很接近的例子:
Nokogiri。如果我写这个包装器,我可以用上面描述的方式调用包装器:
#!/usr/bin/ruby
require 'nokogiri'
Nokogiri::XML(STDIN).xpath(ARGV[0]).each do |row|
puts row
end
XML:: XPath。将与此包装工作:
#!/usr/bin/perl
use strict;
use warnings;
use XML::XPath;
my $root = XML::XPath->new(ioref => 'STDIN');
for my $node ($root->find($ARGV[0])->get_nodelist) {
print($node->getData, "\n");
}
xpath从XML:: xpath返回太多噪音,——NODE——和attribute = "value"。
来自XML::Twig的xml_grep不能处理不返回元素的表达式,因此不能在没有进一步处理的情况下用于提取属性值。
编辑:
Echo cat //element/@attribute | xmllint——shell filename.xml返回类似xpath的噪声。
Xmllint——xpath //element/@attribute filename.xml返回attribute = "value"。
xmllint——xpath 'string(//element/@attribute)' filename.xml返回我想要的,但只对第一个匹配。
对于另一个几乎可以满足这个问题的解决方案,下面是一个可以用于计算任意XPath表达式的XSLT(需要XSLT处理器中的dyn:evaluate支持):
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:dyn="http://exslt.org/dynamic" extension-element-prefixes="dyn">
<xsl:output omit-xml-declaration="yes" indent="no" method="text"/>
<xsl:template match="/">
<xsl:for-each select="dyn:evaluate($pattern)">
<xsl:value-of select="dyn:evaluate($value)"/>
<xsl:value-of select="' '"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
使用xsltproc——stringparam pattern //element/@attribute——stringparam value运行。arbitrary-xpath。xslt filename.xml。
A minimal wrapper for python's lxml module that will print all matching nodes by name (at any level), e.g. mysubnode or an XPath subset e.g. //intermediarynode/subnode. If the expression evaluates to text then text will be printed, if it evaluates to an element then the entire raw element will be rendered to text. It also attempts to handle XML namespaces in a way that allows using local tag names without prefixing. With extended XPath mode enabled via the -x flag the default namespace needs to be referenced with the p: prefix, e.g. //p:tagname/p:subtag
#!/usr/bin/env python3
import argparse
import os
import sys
from lxml import etree
DEFAULT_NAMESPACE_KEY = 'p'
def print_element(elem):
if isinstance(elem, str):
print(elem)
elif isinstance(elem, bytes):
print(elem.decode('utf-8'))
else:
print(elem.text and elem.text.strip() or etree.tostring(elem, encoding='unicode', pretty_print=True))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='XPATH lxml wrapper',
usage="""
Print all nodes by name in XML file:
\t{0} myfile.xml somename
Print all nodes by XPath selector (findall: reduced subset):
\t{0} myfile.xml //itermediarynode/childnode
Print attribute values by XPath selector 'p' maps to default namespace (xpath 1.0: extended subset):
\t{0} myfile.xml //p:itermediarynode/p:childnode/@src -x
""".format(os.path.basename(sys.argv[0])))
parser.add_argument('xpath_file',
help='XPath file path')
parser.add_argument('xpath_expression',
help='tag name or xpath expression')
parser.add_argument('--force_xpath', '-x',
action='store_true',
default=False,
help='Use lxml.xpath (rather than findall)'
)
args = parser.parse_args(sys.argv[1:])
xpath_expression = args.xpath_expression
tree = etree.parse(args.xpath_file)
ns = tree.getroot().nsmap
if args.force_xpath:
if ns.keys() and None in ns:
ns[DEFAULT_NAMESPACE_KEY] = ns.pop(None)
for node in tree.xpath(xpath_expression, namespaces=ns):
print_element(node)
elif xpath_expression.isalpha():
for node in tree.xpath(f"//*[local-name() = '{xpath_expression}']"):
print_element(node)
else:
for el in tree.findall(xpath_expression, namespaces=ns):
print_element(el)
它使用lxml -一个用C编写的快速XML解析器,它不包含在标准python库中。使用pip Install lxml安装它。在Linux/OSX上可能需要用sudo作为前缀。
用法:
python3 xmlcat.py file.xml "//mynode"
lxml也可以接受URL作为输入:
python3 xmlcat.py http://example.com/file.xml "//mynode"
提取框节点下的url属性,即<enclosure url="http:…""..>)(-x强制扩展XPath 1.0子集):
python3 xmlcat.py xmlcat.py file.xml "//enclosure/@url" -x
在谷歌Chrome中的Xpath
作为一个无关的边注:如果碰巧你想运行一个XPath表达式对网页的标记,那么你可以直接从Chrome devtools:右键单击页面在Chrome >选择检查,然后在devtools控制台粘贴你的XPath表达式为$x("//spam/eggs")。
获取本页上的所有作者:
$x("//*[@class='user-details']/a/text()")
一个解决方案,即使存在命名空间声明的顶部:
如果xml在顶部声明了名称空间,答案中提出的大多数命令都不能开箱即用。考虑一下:
输入xml:
<elem1 xmlns="urn:x" xmlns:prefix="urn:y">
<elem2 attr1="false" attr2="value2">
elem2 value
</elem2>
<elem2 attr1="true" attr2="value2.1">
elem2.1 value
</elem2>
<prefix:elem3>
elem3 value
</prefix:elem3>
</elem1>
不工作:
xmlstarlet sel -t -v "/elem1" input.xml
# nothing printed
xmllint -xpath "/elem1" input.xml
# XPath set is empty
解决方案:
# Requires >=java11 to run like below (but the code requires >=java17 for case syntax to be recognized)
# Prints the whole document
java ExtractXpath.java "/" example-inputs/input.xml
# Prints the contents and self of "elem1"
java ExtractXpath.java "/elem1" input.xml
# Prints the contents and self of "elem2" whose attr2 value is: 'value2'
java ExtractXpath.java "//elem2[@attr2='value2']" input.xml
# Prints the value of the attribute 'attr2': "value2", "value2.1"
java ExtractXpath.java "/elem1/elem2/@attr2" input.xml
# Prints the text inside elem3: "elem3 value"
java ExtractXpath.java "/elem1/elem3/text()" input.xml
# Prints the name of the matched element: "prefix:elem3"
java ExtractXpath.java "name(/elem1/elem3)" input.xml
# Same as above: "prefix:elem3"
java ExtractXpath.java "name(*/elem3)" input.xml
# Prints the count of the matched elements: 2.0
java ExtractXpath.java "count(/elem2)" input.xml
# known issue: while "//elem2" works. "//elem3" does not (it works only with: '*/elem3' )
ExtractXpath.java:
import java.io.File;
import java.io.FileInputStream;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.stream.Collectors;
import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathEvaluationResult;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXpath {
public static void main(String[] args) throws Exception {
assertThat(args.length==2, "Wrong number of args");
String xpath = args[0];
File file = new File(args[1]);
assertThat(file.isFile(), file.getAbsolutePath()+" is not a file.");
FileInputStream fileIS = new FileInputStream(file);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = xpath;
XPathExpression xpathExpression = xPath.compile(expression);
XPathEvaluationResult xpathEvalResult = xpathExpression.evaluateExpression(xmlDocument);
System.out.println(applyXpathExpression(xmlDocument, xpathExpression, xpathEvalResult.type().name()));
}
private static String applyXpathExpression(Document xmlDocument, XPathExpression expr, String xpathTypeName) throws TransformerConfigurationException, TransformerException, XPathExpressionException {
// see: https://www.w3.org/TR/1999/REC-xpath-19991116/#corelib
List<String> retVal = new ArrayList();
if(xpathTypeName.equals(XPathConstants.NODESET.getLocalPart())){ //e.g. xpath: /elem1/*
NodeList nodeList = (NodeList)expr.evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
retVal.add(convertNodeToString(nodeList.item(i)));
}
}else if(xpathTypeName.equals(XPathConstants.STRING.getLocalPart())){ //e.g. xpath: name(/elem1/*)
retVal.add((String)expr.evaluate(xmlDocument, XPathConstants.STRING));
}else if(xpathTypeName.equals(XPathConstants.NUMBER.getLocalPart())){ //e.g. xpath: count(/elem1/*)
retVal.add(((Number)expr.evaluate(xmlDocument, XPathConstants.NUMBER)).toString());
}else if(xpathTypeName.equals(XPathConstants.BOOLEAN.getLocalPart())){ //e.g. xpath: contains(elem1, 'sth')
retVal.add(((Boolean)expr.evaluate(xmlDocument, XPathConstants.BOOLEAN)).toString());
}else if(xpathTypeName.equals(XPathConstants.NODE.getLocalPart())){ //e.g. xpath: fixme: find one
System.err.println("WARNING found xpathTypeName=NODE");
retVal.add(convertNodeToString((Node)expr.evaluate(xmlDocument, XPathConstants.NODE)));
}else{
throw new RuntimeException("Unexpected xpath type name: "+xpathTypeName+". This should normally not happen");
}
return retVal.stream().map(str->"==MATCH_START==\n"+str+"\n==MATCH_END==").collect(Collectors.joining ("\n"));
}
private static String convertNodeToString(Node node) throws TransformerConfigurationException, TransformerException {
short nType = node.getNodeType();
switch (nType) {
case Node.ATTRIBUTE_NODE , Node.TEXT_NODE -> {
return node.getNodeValue();
}
case Node.ELEMENT_NODE, Node.DOCUMENT_NODE -> {
StringWriter writer = new StringWriter();
Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
trans.transform(new DOMSource(node), new StreamResult(writer));
return writer.toString();
}
default -> {
System.err.println("WARNING: FIXME: Node type:"+nType+" could possibly be handled in a better way.");
return node.getNodeValue();
}
}
}
private static void assertThat(boolean b, String msg) {
if(!b){
System.err.println(msg+"\n\nUSAGE: program xpath xmlFile");
System.exit(-1);
}
}
}
@SuppressWarnings("unchecked")
class NamespaceResolver implements NamespaceContext {
//Store the source document to search the namespaces
private final Document sourceDocument;
public NamespaceResolver(Document document) {
sourceDocument = document;
}
//The lookup for the namespace uris is delegated to the stored document.
@Override
public String getNamespaceURI(String prefix) {
if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
return sourceDocument.lookupNamespaceURI(null);
} else {
return sourceDocument.lookupNamespaceURI(prefix);
}
}
@Override
public String getPrefix(String namespaceURI) {
return sourceDocument.lookupPrefix(namespaceURI);
}
@SuppressWarnings("rawtypes")
@Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
}
为了简单起见:
xpath-extract命令:
#!/bin/bash
java ExtractXpath.java "$1" "$2"