如何在Java中使用XPath读取XML

我想使用Java中的XPath读取XML数据，因此对于我所收集的信息，我无法根据我的要求解析XML。

这就是我想做的:

从网上通过其URL获取XML文件，然后使用XPath来解析它，我想在其中创建两个方法。一种情况是，我输入一个特定的节点属性id，结果得到所有的子节点，另一种情况是，假设我只想得到一个特定的子节点值

<?xml version="1.0"?>
<howto>
  <topic name="Java">
      <url>http://www.rgagnonjavahowto.htm</url>
  <car>taxi</car>
  </topic>
  <topic name="PowerBuilder">
       <url>http://www.rgagnon/pbhowto.htm</url>
       <url>http://www.rgagnon/pbhowtonew.htm</url>
  </topic>
  <topic name="Javascript">
        <url>http://www.rgagnon/jshowto.htm</url>
  </topic>
 <topic name="VBScript">
       <url>http://www.rgagnon/vbshowto.htm</url>
 </topic>
 </howto>

在上面的例子中，我想读取所有的元素，如果我通过@name搜索，还有一个函数，我只是想从@name 'Javascript'的url只返回一个节点元素。

当前回答

入门示例:

xml文件:

<inventory>
    <book year="2000">
        <title>Snow Crash</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553380958</isbn>
        <price>14.95</price>
    </book>

    <book year="2005">
        <title>Burning Tower</title>
        <author>Larry Niven</author>
        <author>Jerry Pournelle</author>
        <publisher>Pocket</publisher>
        <isbn>0743416910</isbn>
        <price>5.99</price>
    </book>

    <book year="1995">
        <title>Zodiac</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553573862</isbn>
        <price>7.50</price>
    </book>

    <!-- more books... -->

</inventory>

Java代码:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;


try {

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
    Document doc = docBuilder.parse (new File("c:\\tmp\\my.xml"));

    // normalize text representation
    doc.getDocumentElement().normalize();
    System.out.println ("Root element of the doc is " + doc.getDocumentElement().getNodeName());

    NodeList listOfBooks = doc.getElementsByTagName("book");
    int totalBooks = listOfBooks.getLength();
    System.out.println("Total no of books : " + totalBooks);

    for(int i=0; i<listOfBooks.getLength() ; i++) {

        Node firstBookNode = listOfBooks.item(i);
        if(firstBookNode.getNodeType() == Node.ELEMENT_NODE) {

            Element firstElement = (Element)firstBookNode;                              
            System.out.println("Year :"+firstElement.getAttribute("year"));

            //-------
            NodeList firstNameList = firstElement.getElementsByTagName("title");
            Element firstNameElement = (Element)firstNameList.item(0);

            NodeList textFNList = firstNameElement.getChildNodes();
            System.out.println("title : " + ((Node)textFNList.item(0)).getNodeValue().trim());
        }
    }//end of for loop with s var
} catch (SAXParseException err) {
    System.out.println ("** Parsing error" + ", line " + err.getLineNumber () + ", uri " + err.getSystemId ());
    System.out.println(" " + err.getMessage ());
} catch (SAXException e) {
    Exception x = e.getException ();
    ((x == null) ? e : x).printStackTrace ();
} catch (Throwable t) {
    t.printStackTrace ();
}

2013-06-23 11:30:27

其他回答

你需要这样的东西:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(<uri_as_string>);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);

然后调用express .evaluate()，传递在该代码中定义的文档和您期望的返回类型，并将结果转换为结果的对象类型。

如果您需要关于特定XPath表达式的帮助，您可能应该将其作为单独的问题提出(除非这是您首先要问的问题—我理解您的问题是如何在Java中使用API)。

编辑:(回复评论):这个XPath表达式将为您提供PowerBuilder下第一个URL元素的文本:

/howto/topic[@name='PowerBuilder']/url/text()

这将得到第二个:

/howto/topic[@name='PowerBuilder']/url[2]/text()

你可以通过下面的代码得到:

expr.evaluate(doc, XPathConstants.STRING);

如果你不知道给定节点中有多少url，那么你应该这样做:

XPathExpression expr = xpath.compile("/howto/topic[@name='PowerBuilder']/url");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

然后循环遍历NodeList。

2010-05-11 13:34:14

使用XPathFactory、SAXParserFactory和StAX (JSR-173)读取XML文件。

使用XPath获取节点及其子数据。

public static void main(String[] args) {
    String xml = "<soapenv:Body xmlns:soapenv='http://schemas.xmlsoap.org/soap/envelope/'>"
            + "<Yash:Data xmlns:Yash='http://Yash.stackoverflow.com/Services/Yash'>"
            + "<Yash:Tags>Java</Yash:Tags><Yash:Tags>Javascript</Yash:Tags><Yash:Tags>Selenium</Yash:Tags>"
            + "<Yash:Top>javascript</Yash:Top><Yash:User>Yash-777</Yash:User>"
            + "</Yash:Data></soapenv:Body>";
    String jsonNameSpaces = "{'soapenv':'http://schemas.xmlsoap.org/soap/envelope/',"
            + "'Yash':'http://Yash.stackoverflow.com/Services/Yash'}";
    String xpathExpression = "//Yash:Data";

    Document doc1 = getDocument(false, "fileName", xml);
    getNodesFromXpath(doc1, xpathExpression, jsonNameSpaces);
    System.out.println("\n===== ***** =====");
    Document doc2 = getDocument(true, "./books.xml", xml);
    getNodesFromXpath(doc2, "//person", "{}");
}
static Document getDocument( boolean isFileName, String fileName, String xml ) {
    Document doc = null;
    try {

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(false);
        factory.setNamespaceAware(true);
        factory.setIgnoringComments(true);
        factory.setIgnoringElementContentWhitespace(true);

        DocumentBuilder builder = factory.newDocumentBuilder();
        if( isFileName ) {
            File file = new File( fileName );
            FileInputStream stream = new FileInputStream( file );
            doc = builder.parse( stream );
        } else {
            doc = builder.parse( string2Source( xml ) );
        }
    } catch (SAXException | IOException e) {
        e.printStackTrace();
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    }
    return doc;
}

/**
 * ELEMENT_NODE[1],ATTRIBUTE_NODE[2],TEXT_NODE[3],CDATA_SECTION_NODE[4],
 * ENTITY_REFERENCE_NODE[5],ENTITY_NODE[6],PROCESSING_INSTRUCTION_NODE[7],
 * COMMENT_NODE[8],DOCUMENT_NODE[9],DOCUMENT_TYPE_NODE[10],DOCUMENT_FRAGMENT_NODE[11],NOTATION_NODE[12]
 */
public static void getNodesFromXpath( Document doc, String xpathExpression, String jsonNameSpaces ) {
    try {
        XPathFactory xpf = XPathFactory.newInstance();
        XPath xpath = xpf.newXPath();

        JSONObject namespaces = getJSONObjectNameSpaces(jsonNameSpaces);
        if ( namespaces.size() > 0 ) {
            NamespaceContextImpl nsContext = new NamespaceContextImpl();

            Iterator<?> key = namespaces.keySet().iterator();
            while (key.hasNext()) { // Apache WebServices Common Utilities
                String pPrefix = key.next().toString();
                String pURI = namespaces.get(pPrefix).toString();
                nsContext.startPrefixMapping(pPrefix, pURI);
            }
            xpath.setNamespaceContext(nsContext );
        }

        XPathExpression compile = xpath.compile(xpathExpression);
        NodeList nodeList = (NodeList) compile.evaluate(doc, XPathConstants.NODESET);
        displayNodeList(nodeList);
    } catch (XPathExpressionException e) {
        e.printStackTrace();
    }
}

static void displayNodeList( NodeList nodeList ) {
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node node = nodeList.item(i);
        String NodeName = node.getNodeName();

        NodeList childNodes = node.getChildNodes();
        if ( childNodes.getLength() > 1 ) {
            for (int j = 0; j < childNodes.getLength(); j++) {

                Node child = childNodes.item(j);
                short nodeType = child.getNodeType();
                if ( nodeType == 1 ) {
                    System.out.format( "\n\t Node Name:[%s], Text[%s] ", child.getNodeName(), child.getTextContent() );
                }
            }
        } else {
            System.out.format( "\n Node Name:[%s], Text[%s] ", NodeName, node.getTextContent() );
        }

    }
}
static InputSource string2Source( String str ) {
    InputSource inputSource = new InputSource( new StringReader( str ) );
    return inputSource;
}
static JSONObject getJSONObjectNameSpaces( String jsonNameSpaces ) {
    if(jsonNameSpaces.indexOf("'") > -1)    jsonNameSpaces = jsonNameSpaces.replace("'", "\"");
    JSONParser parser = new JSONParser();
    JSONObject namespaces = null;
    try {
        namespaces = (JSONObject) parser.parse(jsonNameSpaces);
    } catch (ParseException e) {
        e.printStackTrace();
    }
    return namespaces;
}

XML文档

<?xml version="1.0" encoding="UTF-8"?>
<book>
<person>
  <first>Yash</first>
  <last>M</last>
  <age>22</age>
</person>
<person>
  <first>Bill</first>
  <last>Gates</last>
  <age>46</age>
</person>
<person>
  <first>Steve</first>
  <last>Jobs</last>
  <age>40</age>
</person>
</book>

对于给定的xpatheexpression输出:

String xpathExpression = "//person/first";
/*OutPut:
 Node Name:[first], Text[Yash] 
 Node Name:[first], Text[Bill] 
 Node Name:[first], Text[Steve] */

String xpathExpression = "//person";
/*OutPut:
     Node Name:[first], Text[Yash] 
     Node Name:[last], Text[M] 
     Node Name:[age], Text[22] 
     Node Name:[first], Text[Bill] 
     Node Name:[last], Text[Gates] 
     Node Name:[age], Text[46] 
     Node Name:[first], Text[Steve] 
     Node Name:[last], Text[Jobs] 
     Node Name:[age], Text[40] */

String xpathExpression = "//Yash:Data";
/*OutPut:
     Node Name:[Yash:Tags], Text[Java] 
     Node Name:[Yash:Tags], Text[Javascript] 
     Node Name:[Yash:Tags], Text[Selenium] 
     Node Name:[Yash:Top], Text[javascript] 
     Node Name:[Yash:User], Text[Yash-777] */

请参阅此链接了解我们自己的NamespaceContext实现

2017-11-14 07:56:34

入门示例:

xml文件:

<inventory>
    <book year="2000">
        <title>Snow Crash</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553380958</isbn>
        <price>14.95</price>
    </book>

    <book year="2005">
        <title>Burning Tower</title>
        <author>Larry Niven</author>
        <author>Jerry Pournelle</author>
        <publisher>Pocket</publisher>
        <isbn>0743416910</isbn>
        <price>5.99</price>
    </book>

    <book year="1995">
        <title>Zodiac</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553573862</isbn>
        <price>7.50</price>
    </book>

    <!-- more books... -->

</inventory>

Java代码:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;


try {

    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
    Document doc = docBuilder.parse (new File("c:\\tmp\\my.xml"));

    // normalize text representation
    doc.getDocumentElement().normalize();
    System.out.println ("Root element of the doc is " + doc.getDocumentElement().getNodeName());

    NodeList listOfBooks = doc.getElementsByTagName("book");
    int totalBooks = listOfBooks.getLength();
    System.out.println("Total no of books : " + totalBooks);

    for(int i=0; i<listOfBooks.getLength() ; i++) {

        Node firstBookNode = listOfBooks.item(i);
        if(firstBookNode.getNodeType() == Node.ELEMENT_NODE) {

            Element firstElement = (Element)firstBookNode;                              
            System.out.println("Year :"+firstElement.getAttribute("year"));

            //-------
            NodeList firstNameList = firstElement.getElementsByTagName("title");
            Element firstNameElement = (Element)firstNameList.item(0);

            NodeList textFNList = firstNameElement.getChildNodes();
            System.out.println("title : " + ((Node)textFNList.item(0)).getNodeValue().trim());
        }
    }//end of for loop with s var
} catch (SAXParseException err) {
    System.out.println ("** Parsing error" + ", line " + err.getLineNumber () + ", uri " + err.getSystemId ());
    System.out.println(" " + err.getMessage ());
} catch (SAXException e) {
    Exception x = e.getException ();
    ((x == null) ? e : x).printStackTrace ();
} catch (Throwable t) {
    t.printStackTrace ();
}

2013-06-23 11:30:27

在@bluish和@Yishai的精彩回答上展开，下面是如何使nodelist和节点属性支持迭代器，即for(node n: nodelist)接口。

像这样使用它:

NodeList nl = ...
for(Node n : XmlUtil.asList(nl))
{...}

and

Node n = ...
for(Node attr : XmlUtil.asList(n.getAttributes())
{...}

代码:

/**
 * Converts NodeList to an iterable construct.
 * From: https://stackoverflow.com/a/19591302/779521
 */
public final class XmlUtil {
    private XmlUtil() {}

    public static List<Node> asList(NodeList n) {
        return n.getLength() == 0 ? Collections.<Node>emptyList() : new NodeListWrapper(n);
    }

    static final class NodeListWrapper extends AbstractList<Node> implements RandomAccess {
        private final NodeList list;

        NodeListWrapper(NodeList l) {
            this.list = l;
        }

        public Node get(int index) {
            return this.list.item(index);
        }

        public int size() {
            return this.list.getLength();
        }
    }

    public static List<Node> asList(NamedNodeMap n) {
        return n.getLength() == 0 ? Collections.<Node>emptyList() : new NodeMapWrapper(n);
    }

    static final class NodeMapWrapper extends AbstractList<Node> implements RandomAccess {
        private final NamedNodeMap list;

        NodeMapWrapper(NamedNodeMap l) {
            this.list = l;
        }

        public Node get(int index) {
            return this.list.item(index);
        }

        public int size() {
            return this.list.getLength();
        }
    }
}

2017-10-04 07:43:53

这将向您展示如何

将XML文件读入DOM 用XPath过滤出一组节点对每个提取的节点执行特定的操作。

我们将用下面的语句调用代码

processFilteredXml(xmlIn, xpathExpr,(node) -> {/*Do something...*/;});

在本例中，我们希望从book.xml中打印一些creatorName，使用"//book/creators/creator/creatorName"作为xpath，在每个匹配xpath的节点上执行printNode操作。

完整代码

@Test
public void printXml() {
    try (InputStream in = readFile("book.xml")) {
        processFilteredXml(in, "//book/creators/creator/creatorName", (node) -> {
            printNode(node, System.out);
        });
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

private InputStream readFile(String yourSampleFile) {
    return Thread.currentThread().getContextClassLoader().getResourceAsStream(yourSampleFile);
}

private void processFilteredXml(InputStream in, String xpath, Consumer<Node> process) {
    Document doc = readXml(in);
    NodeList list = filterNodesByXPath(doc, xpath);
    for (int i = 0; i < list.getLength(); i++) {
        Node node = list.item(i);
        process.accept(node);
    }
}

public Document readXml(InputStream xmlin) {
    try {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        return db.parse(xmlin);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

private NodeList filterNodesByXPath(Document doc, String xpathExpr) {
    try {
        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xpath = xPathFactory.newXPath();
        XPathExpression expr = xpath.compile(xpathExpr);
        Object eval = expr.evaluate(doc, XPathConstants.NODESET);
        return (NodeList) eval;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

private void printNode(Node node, PrintStream out) {
    try {
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        StreamResult result = new StreamResult(new StringWriter());
        DOMSource source = new DOMSource(node);
        transformer.transform(source, result);
        String xmlString = result.getWriter().toString();
        out.println(xmlString);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

打印

<creatorName>Fosmire, Michael</creatorName>

<creatorName>Wertz, Ruth</creatorName>

<creatorName>Purzer, Senay</creatorName>

对于book.xml

<book>
  <creators>
    <creator>
      <creatorName>Fosmire, Michael</creatorName>
      <givenName>Michael</givenName>
      <familyName>Fosmire</familyName>
    </creator>
    <creator>
      <creatorName>Wertz, Ruth</creatorName>
      <givenName>Ruth</givenName>
      <familyName>Wertz</familyName>
    </creator>
    <creator>
      <creatorName>Purzer, Senay</creatorName>
       <givenName>Senay</givenName>
       <familyName>Purzer</familyName>
    </creator>
  </creators>
  <titles>
    <title>Critical Engineering Literacy Test (CELT)</title>
  </titles>
</book>

2018-10-10 09:06:13

如何在Java中使用XPath读取XML

推荐文章

最新文章

标签