我有一个小问题,XPath包含与dom4j…
假设我的XML是
<Home>
<Addr>
<Street>ABC</Street>
<Number>5</Number>
<Comment>BLAH BLAH BLAH <br/><br/>ABC</Comment>
</Addr>
</Home>
假设我想找到文本中所有有ABC的节点,给定根元素…
所以我需要写的XPath是
/ * [contains(短信),‘ABC’)
然而,这不是dom4j返回的内容....这是dom4j的问题,还是我对XPath工作原理的理解,因为该查询只返回Street元素而不返回Comment元素?
DOM使Comment元素成为一个具有四个标记(两个)的复合元素
[Text = 'XYZ'][BR][BR][Text = 'ABC']
我假设查询仍然应该返回元素,因为它应该找到元素并在其上运行contains,但它没有……
下面的查询返回元素,但它返回的不仅仅是元素——它还返回父元素,这对问题来说是不可取的。
//*[contains(text(),'ABC')]
有人知道XPath查询只返回元素<Street/>和<Comment/>吗?
<Comment>标记包含两个文本节点和两个<br>节点作为子节点。
你的xpath表达式是
//*[contains(text(),'ABC')]
为了分析这个问题,
* is a selector that matches any element (i.e. tag) -- it returns a node-set.
The [] are a conditional that operates on each individual node in that node set. It matches if any of the individual nodes it operates on match the conditions inside the brackets.
text() is a selector that matches all of the text nodes that are children of the context node -- it returns a node set.
contains is a function that operates on a string. If it is passed a node set, the node set is converted into a string by returning the string-value of the node in the node-set that is first in document order. Hence, it can match only the first text node in your <Comment> element -- namely BLAH BLAH BLAH. Since that doesn't match, you don't get a <Comment> in your results.
你需要把这个改成
//*[text()[contains(.,'ABC')]]
* is a selector that matches any element (i.e. tag) -- it returns a node-set.
The outer [] are a conditional that operates on each individual node in that node set -- here it operates on each element in the document.
text() is a selector that matches all of the text nodes that are children of the context node -- it returns a node set.
The inner [] are a conditional that operates on each node in that node set -- here each individual text node. Each individual text node is the starting point for any path in the brackets, and can also be referred to explicitly as . within the brackets. It matches if any of the individual nodes it operates on match the conditions inside the brackets.
contains is a function that operates on a string. Here it is passed an individual text node (.). Since it is passed the second text node in the <Comment> tag individually, it will see the 'ABC' string and be able to match it.
<Comment>标记包含两个文本节点和两个<br>节点作为子节点。
你的xpath表达式是
//*[contains(text(),'ABC')]
为了分析这个问题,
* is a selector that matches any element (i.e. tag) -- it returns a node-set.
The [] are a conditional that operates on each individual node in that node set. It matches if any of the individual nodes it operates on match the conditions inside the brackets.
text() is a selector that matches all of the text nodes that are children of the context node -- it returns a node set.
contains is a function that operates on a string. If it is passed a node set, the node set is converted into a string by returning the string-value of the node in the node-set that is first in document order. Hence, it can match only the first text node in your <Comment> element -- namely BLAH BLAH BLAH. Since that doesn't match, you don't get a <Comment> in your results.
你需要把这个改成
//*[text()[contains(.,'ABC')]]
* is a selector that matches any element (i.e. tag) -- it returns a node-set.
The outer [] are a conditional that operates on each individual node in that node set -- here it operates on each element in the document.
text() is a selector that matches all of the text nodes that are children of the context node -- it returns a node set.
The inner [] are a conditional that operates on each node in that node set -- here each individual text node. Each individual text node is the starting point for any path in the brackets, and can also be referred to explicitly as . within the brackets. It matches if any of the individual nodes it operates on match the conditions inside the brackets.
contains is a function that operates on a string. Here it is passed an individual text node (.). Since it is passed the second text node in the <Comment> tag individually, it will see the 'ABC' string and be able to match it.
XML文档:
<Home>
<Addr>
<Street>ABC</Street>
<Number>5</Number>
<Comment>BLAH BLAH BLAH <br/><br/>ABC</Comment>
</Addr>
</Home>
XPath表达式:
//*[contains(text(), 'ABC')]
//*匹配根节点的任何后代元素。也就是说,除了根节点之外的任何元素。
[…]是一个谓词,它过滤节点集。它返回节点,其中…是正确的:
谓词筛选节点集[…]生成一个新的节点集。对于要筛选的节点集中的每个节点,将计算PredicateExpr[…];如果该节点的PredicateExpr值为true,则该节点包含在新的节点集中;否则,不包括它。
Contains ('haystack', 'needle')如果haystack包含needle则返回true:
函数:boolean contains(string, string)
contains函数如果第一个参数字符串包含第二个参数字符串,则返回true,否则返回false。
但是contains()的第一个参数是字符串。它传递节点。为了处理每个作为第一个参数传递的节点或节点集都被string()函数转换为字符串:
参数被转换为string类型,就像调用string函数一样。
String()函数返回第一个节点的字符串值:
通过返回节点集中文档顺序第一个节点的字符串值,将节点集转换为字符串。如果节点集为空,则返回空字符串。
元素节点的字符串值:
元素节点的字符串值是该元素节点的所有文本节点后代的字符串值按文档顺序的串联。
文本节点的字符串值:
文本节点的字符串值是字符数据。
因此,基本上string-value是包含在节点中的所有文本(所有后代文本节点的连接)。
Text()是一个匹配任何文本节点的节点测试:
对于任何文本节点,节点test text()都是true。例如,child::text()将选择上下文节点的文本节点子节点。
也就是说,//*[contains(text(), 'ABC')]匹配任何元素(除了根节点),其中第一个文本节点包含ABC。因为text()返回一个节点集,其中包含上下文节点的所有子文本节点(相对于表达式求值)。但是contains()只接受第一个。因此,对于上面的文档,路径与Street元素匹配。
下面的表达式//*[text()]包含(。, 'ABC')]]匹配任何元素(除了根节点),至少有一个子文本节点,包含ABC. .表示上下文节点。在本例中,它是除根节点外的任何元素的子文本节点。因此,对于上面的文档,路径匹配Street和Comment元素。
现在,//*[包含(。, 'ABC')]匹配包含ABC的任何元素(根节点除外)(在后代文本节点的拼接中)。对于上面的文档,它匹配Home、Addr、Street和Comment元素。因此,//*[包含(。, 'BLAH ABC')]匹配Home、Addr和Comment元素。