我需要一个正则表达式来选择两个外括号之间的所有文本。

例子: START_TEXT(这里的文本(可能的文本)文本(可能的文本(更多的文本))END_TXT ^ ^

结果: (此处文本(可能的文本)文本(可能的文本(更多的文本)))


当前回答

使用Ruby(1.9.3或更高版本)的正则表达式:

/(?<match>\((?:\g<match>|[^()]++)*\))/

关节演示

其他回答

"""
Here is a simple python program showing how to use regular
expressions to write a paren-matching recursive parser.

This parser recognises items enclosed by parens, brackets,
braces and <> symbols, but is adaptable to any set of
open/close patterns.  This is where the re package greatly
assists in parsing. 
"""

import re


# The pattern below recognises a sequence consisting of:
#    1. Any characters not in the set of open/close strings.
#    2. One of the open/close strings.
#    3. The remainder of the string.
# 
# There is no reason the opening pattern can't be the
# same as the closing pattern, so quoted strings can
# be included.  However quotes are not ignored inside
# quotes.  More logic is needed for that....


pat = re.compile("""
    ( .*? )
    ( \( | \) | \[ | \] | \{ | \} | \< | \> |
                           \' | \" | BEGIN | END | $ )
    ( .* )
    """, re.X)

# The keys to the dictionary below are the opening strings,
# and the values are the corresponding closing strings.
# For example "(" is an opening string and ")" is its
# closing string.

matching = { "(" : ")",
             "[" : "]",
             "{" : "}",
             "<" : ">",
             '"' : '"',
             "'" : "'",
             "BEGIN" : "END" }

# The procedure below matches string s and returns a
# recursive list matching the nesting of the open/close
# patterns in s.

def matchnested(s, term=""):
    lst = []
    while True:
        m = pat.match(s)

        if m.group(1) != "":
            lst.append(m.group(1))

        if m.group(2) == term:
            return lst, m.group(3)

        if m.group(2) in matching:
            item, s = matchnested(m.group(3), matching[m.group(2)])
            lst.append(m.group(2))
            lst.append(item)
            lst.append(matching[m.group(2)])
        else:
            raise ValueError("After <<%s %s>> expected %s not %s" %
                             (lst, s, term, m.group(2)))

# Unit test.

if __name__ == "__main__":
    for s in ("simple string",
              """ "double quote" """,
              """ 'single quote' """,
              "one'two'three'four'five'six'seven",
              "one(two(three(four)five)six)seven",
              "one(two(three)four)five(six(seven)eight)nine",
              "one(two)three[four]five{six}seven<eight>nine",
              "one(two[three{four<five>six}seven]eight)nine",
              "oneBEGINtwo(threeBEGINfourENDfive)sixENDseven",
              "ERROR testing ((( mismatched ))] parens"):
        print "\ninput", s
        try:
            lst, s = matchnested(s)
            print "output", lst
        except ValueError as e:
            print str(e)
    print "done"

我想添加这个答案,以便快速参考。请随时更新。


.NET Regex使用平衡组:

\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)

其中c用作深度计数器。

在Regexstorm.com上进行演示

堆栈溢出:使用正则表达式来平衡匹配括号 Wes令人困惑的博客:平衡结构与。net正则表达式的匹配 Greg Reinacker的Weblog:正则表达式中的嵌套结构


使用递归模式的PCRE:

\((?:[^)(]+|(?R))*+\)

演示在regex101;或无交替的:

\((?:[^)(]*(?R)?)*+\)

演示在regex101;或为表演而展开:

\([^)(]*+(?:(?R)[^)(]*)*+\)

演示在regex101;模式被粘贴在(?R)处,它表示(?0)。

Perl, PHP, notepad++, R: Perl =TRUE, Python: PyPI正则表达式模块与(?V1)的Perl行为。 (新版本的PyPI regex包已经默认为this→DEFAULT_VERSION = VERSION1)


Ruby使用子表达式调用:

与Ruby 2.0 \g<0>可以用来调用完整的模式。

\((?>[^)(]+|\g<0>)*\)

在Rubular演示;Ruby 1.9只支持捕获组递归:

(\((?>[^)(]+|\g<1>)*\))

Rubular的演示(从Ruby 1.9.3开始进行原子分组)


API JavaScript

XRegExp.matchRecursive(str, '\\(', '\\)', 'g');

Java: @jaytea使用前向引用的有趣想法。


不递归最多3层嵌套: (JS, Java和其他类型的正则表达式)

为了防止不平衡时失控,只在最内层[)(]上使用*。

\((?:[^)(]|\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\))*\)

演示在regex101;或展开以获得更好的性能(首选)。

\([^)(]*(?:\([^)(]*(?:\([^)(]*(?:\([^)(]*\)[^)(]*)*\)[^)(]*)*\)[^)(]*)*\)

演示在regex101;需要根据需要添加更深层次的嵌套。


参考-这个正则表达式是什么意思?

递归正则表达式 Regular- expressions .info -正则表达式递归 精通正则表达式- Jeffrey E.F. Friedl 1 2 3 4

这并没有完全解决OP问题,但我认为它可能对一些来这里搜索嵌套结构regexp的人有用:

在javascript中从函数字符串(带有嵌套结构)解析参数

匹配结构如下:

匹配方括号、方括号、圆括号、单引号和双引号

在这里您可以看到生成的regexp正在运行

/**
 * get param content of function string.
 * only params string should be provided without parentheses
 * WORK even if some/all params are not set
 * @return [param1, param2, param3]
 */
exports.getParamsSAFE = (str, nbParams = 3) => {
    const nextParamReg = /^\s*((?:(?:['"([{](?:[^'"()[\]{}]*?|['"([{](?:[^'"()[\]{}]*?|['"([{][^'"()[\]{}]*?['")}\]])*?['")}\]])*?['")}\]])|[^,])*?)\s*(?:,|$)/;
    const params = [];
    while (str.length) { // this is to avoid a BIG performance issue in javascript regexp engine
        str = str.replace(nextParamReg, (full, p1) => {
            params.push(p1);
            return '';
        });
    }
    return params;
};

这是最终的正则表达式:

\(
(?<arguments> 
(  
  ([^\(\)']*) |  
  (\([^\(\)']*\)) |
  '(.*?)'

)*
)
\)

例子:

input: ( arg1, arg2, arg3, (arg4), '(pip' )

output: arg1, arg2, arg3, (arg4), '(pip'

注意,'(pip'被正确地管理为字符串。 (在调节器试过:http://sourceforge.net/projects/regulator/)

这个也有用

re.findall(r'\(.+\)', s)