这是最简单的解释。这是我正在使用的:

re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']

这是我想要的:

someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。


当前回答

我可以把它放在这里吗

s = 'foo/bar spam\neggs'
print(s.replace('/', '+++/+++').replace(' ', '+++ +++').replace('\n', '+++\n+++').split('+++'))

['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

其他回答

如果你在换行上分割,使用splitlines(True)。

>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']

(不是一个通用的解决方案,但在这里添加这个,以防有人来到这里没有意识到这个方法的存在。)

如果你只有一个分隔符,你可以使用列表推导式:

text = 'foo,bar,baz,qux'  
sep = ','

附加/将分隔符:

result = [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# to get rid of trailing
result[-1] = result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']

result = [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# to get rid of trailing
result[0] = result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']

分隔符作为它自己的元素:

result = [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results = result[:-1]   # to get rid of trailing

另一个例子,在非字母数字上进行分割,并保留分隔符

import re
a = "foo,bar@candy*ice%cream"
re.split('([^a-zA-Z0-9])',a)

输出:

['foo', ',', 'bar', '@', 'candy', '*', 'ice', '%', 'cream']

解释

re.split('([^a-zA-Z0-9])',a)

() <- keep the separators
[] <- match everything in between
^a-zA-Z0-9 <-except alphabets, upper/lower and numbers.

如果你想拆分字符串,同时通过regex保留分隔符,而不捕获组:

def finditer_with_separators(regex, s):
    matches = []
    prev_end = 0
    for match in regex.finditer(s):
        match_start = match.start()
        if (prev_end != 0 or match_start > 0) and match_start != prev_end:
            matches.append(s[prev_end:match.start()])
        matches.append(match.group())
        prev_end = match.end()
    if prev_end < len(s):
        matches.append(s[prev_end:])
    return matches

regex = re.compile(r"[\(\)]")
matches = finditer_with_separators(regex, s)

如果假设regex被封装到捕获组中:

def split_with_separators(regex, s):
    matches = list(filter(None, regex.split(s)))
    return matches

regex = re.compile(r"([\(\)])")
matches = split_with_separators(regex, s)

这两种方法也将删除空组,在大多数情况下是无用和恼人的。

安装wrs时“不拆卸分离器”

pip install wrs

(由Rao Hamza开发)

import wrs
text  = "Now inbox “how to make spam ad” Invest in hard email marketing."
splitor = 'email | spam | inbox'
list = wrs.wr_split(splitor, text)
print(list)

结果: ['现在','收件箱'如何制作','垃圾广告'努力投资','电子邮件营销'。]