这是最简单的解释。这是我正在使用的:

re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']

这是我想要的:

someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。


当前回答

使用re.split,并且你的正则表达式来自变量,并且你有多个分隔符,你可以像下面这样使用:

# BashSpecialParamList is the special param in bash,
# such as your separator is the bash special param
BashSpecialParamList = ["$*", "$@", "$#", "$?", "$-", "$$", "$!", "$0"]
# aStr is the the string to be splited
aStr = "$a Klkjfd$0 $? $#%$*Sdfdf"

reStr = "|".join([re.escape(sepStr) for sepStr in BashSpecialParamList])

re.split(f'({reStr})', aStr)

# Then You can get the result:
# ['$a Klkjfd', '$0', ' ', '$?', ' ', '$#', '%', '$*', 'Sdfdf']

参考:GNU Bash特殊参数

其他回答

>>> line = 'hello_toto_is_there'
>>> sep = '_'
>>> [sep + x[1] if x[0] != 0 else x[1] for x in enumerate(line.split(sep))]
['hello', '_toto', '_is', '_there']

使用re.split,并且你的正则表达式来自变量,并且你有多个分隔符,你可以像下面这样使用:

# BashSpecialParamList is the special param in bash,
# such as your separator is the bash special param
BashSpecialParamList = ["$*", "$@", "$#", "$?", "$-", "$$", "$!", "$0"]
# aStr is the the string to be splited
aStr = "$a Klkjfd$0 $? $#%$*Sdfdf"

reStr = "|".join([re.escape(sepStr) for sepStr in BashSpecialParamList])

re.split(f'({reStr})', aStr)

# Then You can get the result:
# ['$a Klkjfd', '$0', ' ', '$?', ' ', '$#', '%', '$*', 'Sdfdf']

参考:GNU Bash特殊参数

re.split的文档中提到:

根据出现的模式拆分字符串。如果捕获 括号是在模式中使用的,然后是文本中的所有组 模式也作为结果列表的一部分返回。

所以你只需要用一个捕获组来包装分隔符:

>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

我发现这种基于生成器的方法更令人满意:

def split_keep(string, sep):
    """Usage:
    >>> list(split_keep("a.b.c.d", "."))
    ['a.', 'b.', 'c.', 'd']
    """
    start = 0
    while True:
        end = string.find(sep, start) + 1
        if end == 0:
            break
        yield string[start:end]
        start = end
    yield string[start:]

它避免了需要找出正确的正则表达式,而在理论上应该相当便宜。它不创建新的字符串对象,并将大部分迭代工作委托给高效的find方法。

... 在Python 3.8中,它可以短到:

def split_keep(string, sep):
    start = 0
    while (end := string.find(sep, start) + 1) > 0:
        yield string[start:end]
        start = end
    yield string[start:]

如果你想拆分字符串,同时通过regex保留分隔符,而不捕获组:

def finditer_with_separators(regex, s):
    matches = []
    prev_end = 0
    for match in regex.finditer(s):
        match_start = match.start()
        if (prev_end != 0 or match_start > 0) and match_start != prev_end:
            matches.append(s[prev_end:match.start()])
        matches.append(match.group())
        prev_end = match.end()
    if prev_end < len(s):
        matches.append(s[prev_end:])
    return matches

regex = re.compile(r"[\(\)]")
matches = finditer_with_separators(regex, s)

如果假设regex被封装到捕获组中:

def split_with_separators(regex, s):
    matches = list(filter(None, regex.split(s)))
    return matches

regex = re.compile(r"([\(\)])")
matches = split_with_separators(regex, s)

这两种方法也将删除空组,在大多数情况下是无用和恼人的。