我需要一个正则表达式来选择两个外括号之间的所有文本。
例子: START_TEXT(这里的文本(可能的文本)文本(可能的文本(更多的文本))END_TXT ^ ^
结果: (此处文本(可能的文本)文本(可能的文本(更多的文本)))
我需要一个正则表达式来选择两个外括号之间的所有文本。
例子: START_TEXT(这里的文本(可能的文本)文本(可能的文本(更多的文本))END_TXT ^ ^
结果: (此处文本(可能的文本)文本(可能的文本(更多的文本)))
当前回答
(?<=\().*(?=\))
如果您想在两个匹配的括号之间选择文本,那么使用正则表达式就不太走运了。这是不可能的。
这个正则表达式只返回字符串中第一个开始括号和最后一个结束括号之间的文本。
(*)除非你的regex引擎有像平衡组或递归这样的特性。支持这些特性的引擎的数量正在缓慢增长,但它们仍然不是普遍可用的。
其他回答
这是最终的正则表达式:
\(
(?<arguments>
(
([^\(\)']*) |
(\([^\(\)']*\)) |
'(.*?)'
)*
)
\)
例子:
input: ( arg1, arg2, arg3, (arg4), '(pip' )
output: arg1, arg2, arg3, (arg4), '(pip'
注意,'(pip'被正确地管理为字符串。 (在调节器试过:http://sourceforge.net/projects/regulator/)
使用Ruby(1.9.3或更高版本)的正则表达式:
/(?<match>\((?:\g<match>|[^()]++)*\))/
关节演示
"""
Here is a simple python program showing how to use regular
expressions to write a paren-matching recursive parser.
This parser recognises items enclosed by parens, brackets,
braces and <> symbols, but is adaptable to any set of
open/close patterns. This is where the re package greatly
assists in parsing.
"""
import re
# The pattern below recognises a sequence consisting of:
# 1. Any characters not in the set of open/close strings.
# 2. One of the open/close strings.
# 3. The remainder of the string.
#
# There is no reason the opening pattern can't be the
# same as the closing pattern, so quoted strings can
# be included. However quotes are not ignored inside
# quotes. More logic is needed for that....
pat = re.compile("""
( .*? )
( \( | \) | \[ | \] | \{ | \} | \< | \> |
\' | \" | BEGIN | END | $ )
( .* )
""", re.X)
# The keys to the dictionary below are the opening strings,
# and the values are the corresponding closing strings.
# For example "(" is an opening string and ")" is its
# closing string.
matching = { "(" : ")",
"[" : "]",
"{" : "}",
"<" : ">",
'"' : '"',
"'" : "'",
"BEGIN" : "END" }
# The procedure below matches string s and returns a
# recursive list matching the nesting of the open/close
# patterns in s.
def matchnested(s, term=""):
lst = []
while True:
m = pat.match(s)
if m.group(1) != "":
lst.append(m.group(1))
if m.group(2) == term:
return lst, m.group(3)
if m.group(2) in matching:
item, s = matchnested(m.group(3), matching[m.group(2)])
lst.append(m.group(2))
lst.append(item)
lst.append(matching[m.group(2)])
else:
raise ValueError("After <<%s %s>> expected %s not %s" %
(lst, s, term, m.group(2)))
# Unit test.
if __name__ == "__main__":
for s in ("simple string",
""" "double quote" """,
""" 'single quote' """,
"one'two'three'four'five'six'seven",
"one(two(three(four)five)six)seven",
"one(two(three)four)five(six(seven)eight)nine",
"one(two)three[four]five{six}seven<eight>nine",
"one(two[three{four<five>six}seven]eight)nine",
"oneBEGINtwo(threeBEGINfourENDfive)sixENDseven",
"ERROR testing ((( mismatched ))] parens"):
print "\ninput", s
try:
lst, s = matchnested(s)
print "output", lst
except ValueError as e:
print str(e)
print "done"
你可以使用regex递归:
\(([^()]|(?R))*\)
虽然很多答案都以某种形式提到了这一点,比如正则表达式不支持递归匹配等等,但主要原因在于计算理论的根源。
形式为{a^nb^n | n>=0}的语言是非正则的。Regex只能匹配构成常规语言集一部分的东西。
阅读更多@这里