是否有一个针对Ubuntu和/或CentOS的包,它有一个命令行工具,可以执行像foo //element@attribute filename.xml或foo //element@attribute < filename.xml这样的XPath一行程序,并逐行返回结果?
我正在寻找一些东西,这将允许我只是apt-get安装foo或yum安装foo,然后只是开箱即用,没有包装或其他必要的适应。
以下是一些很接近的例子:
Nokogiri。如果我写这个包装器,我可以用上面描述的方式调用包装器:
#!/usr/bin/ruby
require 'nokogiri'
Nokogiri::XML(STDIN).xpath(ARGV[0]).each do |row|
puts row
end
XML:: XPath。将与此包装工作:
#!/usr/bin/perl
use strict;
use warnings;
use XML::XPath;
my $root = XML::XPath->new(ioref => 'STDIN');
for my $node ($root->find($ARGV[0])->get_nodelist) {
print($node->getData, "\n");
}
xpath从XML:: xpath返回太多噪音,——NODE——和attribute = "value"。
来自XML::Twig的xml_grep不能处理不返回元素的表达式,因此不能在没有进一步处理的情况下用于提取属性值。
编辑:
Echo cat //element/@attribute | xmllint——shell filename.xml返回类似xpath的噪声。
Xmllint——xpath //element/@attribute filename.xml返回attribute = "value"。
xmllint——xpath 'string(//element/@attribute)' filename.xml返回我想要的,但只对第一个匹配。
对于另一个几乎可以满足这个问题的解决方案,下面是一个可以用于计算任意XPath表达式的XSLT(需要XSLT处理器中的dyn:evaluate支持):
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:dyn="http://exslt.org/dynamic" extension-element-prefixes="dyn">
<xsl:output omit-xml-declaration="yes" indent="no" method="text"/>
<xsl:template match="/">
<xsl:for-each select="dyn:evaluate($pattern)">
<xsl:value-of select="dyn:evaluate($value)"/>
<xsl:value-of select="' '"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
使用xsltproc——stringparam pattern //element/@attribute——stringparam value运行。arbitrary-xpath。xslt filename.xml。
在我搜索maven pom.xml文件时,我遇到了这个问题。然而,我有以下限制:
必须跨平台运行。
必须存在于所有主要的Linux发行版中,而没有任何额外的模块安装
必须处理复杂的xml文件,如maven pom.xml文件
简单的语法
我尝试了上面的许多方法,但都没有成功:
python lxml。Etree不是标准python发行版的一部分
xml。Etree是,但不能很好地处理复杂的maven pom.xml文件,挖得不够深
python xml。由于未知原因,Etree不处理maven pom.xml文件
Xmllint也不能工作,核心转储通常在ubuntu 12.04 " Xmllint: using libxml version 20708"
我所遇到的解决方案是稳定的,简短的,可以在许多平台上工作,并且是成熟的,是ruby中内置的rexml库:
ruby -r rexml/document -e 'include REXML;
puts XPath.first(Document.new($stdin), "/project/version/text()")' < pom.xml
启发我找到这篇文章的是下面的文章:
Ruby/XML, XSLT和XPath教程
IBM: Ruby on Rails和XML
一个解决方案,即使存在命名空间声明的顶部:
如果xml在顶部声明了名称空间,答案中提出的大多数命令都不能开箱即用。考虑一下:
输入xml:
<elem1 xmlns="urn:x" xmlns:prefix="urn:y">
<elem2 attr1="false" attr2="value2">
elem2 value
</elem2>
<elem2 attr1="true" attr2="value2.1">
elem2.1 value
</elem2>
<prefix:elem3>
elem3 value
</prefix:elem3>
</elem1>
不工作:
xmlstarlet sel -t -v "/elem1" input.xml
# nothing printed
xmllint -xpath "/elem1" input.xml
# XPath set is empty
解决方案:
# Requires >=java11 to run like below (but the code requires >=java17 for case syntax to be recognized)
# Prints the whole document
java ExtractXpath.java "/" example-inputs/input.xml
# Prints the contents and self of "elem1"
java ExtractXpath.java "/elem1" input.xml
# Prints the contents and self of "elem2" whose attr2 value is: 'value2'
java ExtractXpath.java "//elem2[@attr2='value2']" input.xml
# Prints the value of the attribute 'attr2': "value2", "value2.1"
java ExtractXpath.java "/elem1/elem2/@attr2" input.xml
# Prints the text inside elem3: "elem3 value"
java ExtractXpath.java "/elem1/elem3/text()" input.xml
# Prints the name of the matched element: "prefix:elem3"
java ExtractXpath.java "name(/elem1/elem3)" input.xml
# Same as above: "prefix:elem3"
java ExtractXpath.java "name(*/elem3)" input.xml
# Prints the count of the matched elements: 2.0
java ExtractXpath.java "count(/elem2)" input.xml
# known issue: while "//elem2" works. "//elem3" does not (it works only with: '*/elem3' )
ExtractXpath.java:
import java.io.File;
import java.io.FileInputStream;
import java.io.StringWriter;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.stream.Collectors;
import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathEvaluationResult;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXpath {
public static void main(String[] args) throws Exception {
assertThat(args.length==2, "Wrong number of args");
String xpath = args[0];
File file = new File(args[1]);
assertThat(file.isFile(), file.getAbsolutePath()+" is not a file.");
FileInputStream fileIS = new FileInputStream(file);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = xpath;
XPathExpression xpathExpression = xPath.compile(expression);
XPathEvaluationResult xpathEvalResult = xpathExpression.evaluateExpression(xmlDocument);
System.out.println(applyXpathExpression(xmlDocument, xpathExpression, xpathEvalResult.type().name()));
}
private static String applyXpathExpression(Document xmlDocument, XPathExpression expr, String xpathTypeName) throws TransformerConfigurationException, TransformerException, XPathExpressionException {
// see: https://www.w3.org/TR/1999/REC-xpath-19991116/#corelib
List<String> retVal = new ArrayList();
if(xpathTypeName.equals(XPathConstants.NODESET.getLocalPart())){ //e.g. xpath: /elem1/*
NodeList nodeList = (NodeList)expr.evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
retVal.add(convertNodeToString(nodeList.item(i)));
}
}else if(xpathTypeName.equals(XPathConstants.STRING.getLocalPart())){ //e.g. xpath: name(/elem1/*)
retVal.add((String)expr.evaluate(xmlDocument, XPathConstants.STRING));
}else if(xpathTypeName.equals(XPathConstants.NUMBER.getLocalPart())){ //e.g. xpath: count(/elem1/*)
retVal.add(((Number)expr.evaluate(xmlDocument, XPathConstants.NUMBER)).toString());
}else if(xpathTypeName.equals(XPathConstants.BOOLEAN.getLocalPart())){ //e.g. xpath: contains(elem1, 'sth')
retVal.add(((Boolean)expr.evaluate(xmlDocument, XPathConstants.BOOLEAN)).toString());
}else if(xpathTypeName.equals(XPathConstants.NODE.getLocalPart())){ //e.g. xpath: fixme: find one
System.err.println("WARNING found xpathTypeName=NODE");
retVal.add(convertNodeToString((Node)expr.evaluate(xmlDocument, XPathConstants.NODE)));
}else{
throw new RuntimeException("Unexpected xpath type name: "+xpathTypeName+". This should normally not happen");
}
return retVal.stream().map(str->"==MATCH_START==\n"+str+"\n==MATCH_END==").collect(Collectors.joining ("\n"));
}
private static String convertNodeToString(Node node) throws TransformerConfigurationException, TransformerException {
short nType = node.getNodeType();
switch (nType) {
case Node.ATTRIBUTE_NODE , Node.TEXT_NODE -> {
return node.getNodeValue();
}
case Node.ELEMENT_NODE, Node.DOCUMENT_NODE -> {
StringWriter writer = new StringWriter();
Transformer trans = TransformerFactory.newInstance().newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
trans.transform(new DOMSource(node), new StreamResult(writer));
return writer.toString();
}
default -> {
System.err.println("WARNING: FIXME: Node type:"+nType+" could possibly be handled in a better way.");
return node.getNodeValue();
}
}
}
private static void assertThat(boolean b, String msg) {
if(!b){
System.err.println(msg+"\n\nUSAGE: program xpath xmlFile");
System.exit(-1);
}
}
}
@SuppressWarnings("unchecked")
class NamespaceResolver implements NamespaceContext {
//Store the source document to search the namespaces
private final Document sourceDocument;
public NamespaceResolver(Document document) {
sourceDocument = document;
}
//The lookup for the namespace uris is delegated to the stored document.
@Override
public String getNamespaceURI(String prefix) {
if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
return sourceDocument.lookupNamespaceURI(null);
} else {
return sourceDocument.lookupNamespaceURI(prefix);
}
}
@Override
public String getPrefix(String namespaceURI) {
return sourceDocument.lookupPrefix(namespaceURI);
}
@SuppressWarnings("rawtypes")
@Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
}
为了简单起见:
xpath-extract命令:
#!/bin/bash
java ExtractXpath.java "$1" "$2"