这是最简单的解释。这是我正在使用的:

re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']

这是我想要的:

someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。


当前回答

我发现这种基于生成器的方法更令人满意:

def split_keep(string, sep):
    """Usage:
    >>> list(split_keep("a.b.c.d", "."))
    ['a.', 'b.', 'c.', 'd']
    """
    start = 0
    while True:
        end = string.find(sep, start) + 1
        if end == 0:
            break
        yield string[start:end]
        start = end
    yield string[start:]

它避免了需要找出正确的正则表达式,而在理论上应该相当便宜。它不创建新的字符串对象,并将大部分迭代工作委托给高效的find方法。

... 在Python 3.8中,它可以短到:

def split_keep(string, sep):
    start = 0
    while (end := string.find(sep, start) + 1) > 0:
        yield string[start:end]
        start = end
    yield string[start:]

其他回答

另一个在Python 3上工作良好的非正则表达式解决方案

# Split strings and keep separator
test_strings = ['<Hello>', 'Hi', '<Hi> <Planet>', '<', '']

def split_and_keep(s, sep):
   if not s: return [''] # consistent with string.split()

   # Find replacement character that is not used in string
   # i.e. just use the highest available character plus one
   # Note: This fails if ord(max(s)) = 0x10FFFF (ValueError)
   p=chr(ord(max(s))+1) 

   return s.replace(sep, sep+p).split(p)

for s in test_strings:
   print(split_and_keep(s, '<'))


# If the unicode limit is reached it will fail explicitly
unicode_max_char = chr(1114111)
ridiculous_string = '<Hello>'+unicode_max_char+'<World>'
print(split_and_keep(ridiculous_string, '<'))

如果你只有一个分隔符,你可以使用列表推导式:

text = 'foo,bar,baz,qux'  
sep = ','

附加/将分隔符:

result = [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# to get rid of trailing
result[-1] = result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']

result = [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# to get rid of trailing
result[0] = result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']

分隔符作为它自己的元素:

result = [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results = result[:-1]   # to get rid of trailing
# This keeps all separators  in result 
##########################################################################
import re
st="%%(c+dd+e+f-1523)%%7"
sh=re.compile('[\+\-//\*\<\>\%\(\)]')

def splitStringFull(sh, st):
   ls=sh.split(st)
   lo=[]
   start=0
   for l in ls:
     if not l : continue
     k=st.find(l)
     llen=len(l)
     if k> start:
       tmp= st[start:k]
       lo.append(tmp)
       lo.append(l)
       start = k + llen
     else:
       lo.append(l)
       start =llen
   return lo
  #############################

li= splitStringFull(sh , st)
['%%(', 'c', '+', 'dd', '+', 'e', '+', 'f', '-', '1523', ')%%', '7']
>>> line = 'hello_toto_is_there'
>>> sep = '_'
>>> [sep + x[1] if x[0] != 0 else x[1] for x in enumerate(line.split(sep))]
['hello', '_toto', '_is', '_there']

如果你在换行上分割,使用splitlines(True)。

>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']

(不是一个通用的解决方案,但在这里添加这个,以防有人来到这里没有意识到这个方法的存在。)