这是最简单的解释。这是我正在使用的:
re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']
这是我想要的:
someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。
这是最简单的解释。这是我正在使用的:
re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']
这是我想要的:
someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。
当前回答
# This keeps all separators in result
##########################################################################
import re
st="%%(c+dd+e+f-1523)%%7"
sh=re.compile('[\+\-//\*\<\>\%\(\)]')
def splitStringFull(sh, st):
ls=sh.split(st)
lo=[]
start=0
for l in ls:
if not l : continue
k=st.find(l)
llen=len(l)
if k> start:
tmp= st[start:k]
lo.append(tmp)
lo.append(l)
start = k + llen
else:
lo.append(l)
start =llen
return lo
#############################
li= splitStringFull(sh , st)
['%%(', 'c', '+', 'dd', '+', 'e', '+', 'f', '-', '1523', ')%%', '7']
其他回答
re.split的文档中提到:
根据出现的模式拆分字符串。如果捕获 括号是在模式中使用的,然后是文本中的所有组 模式也作为结果列表的一部分返回。
所以你只需要用一个捕获组来包装分隔符:
>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
这里有一个简单的.split解决方案,不需要regex。
这是一个没有删除分隔符的Python split()的答案,所以不完全是最初的帖子所要求的,但另一个问题被关闭为这个问题的副本。
def splitkeep(s, delimiter):
split = s.split(delimiter)
return [substr + delimiter for substr in split[:-1]] + [split[-1]]
随机测试:
import random
CHARS = [".", "a", "b", "c"]
assert splitkeep("", "X") == [""] # 0 length test
for delimiter in ('.', '..'):
for _ in range(100000):
length = random.randint(1, 50)
s = "".join(random.choice(CHARS) for _ in range(length))
assert "".join(splitkeep(s, delimiter)) == s
我可以把它放在这里吗
s = 'foo/bar spam\neggs'
print(s.replace('/', '+++/+++').replace(' ', '+++ +++').replace('\n', '+++\n+++').split('+++'))
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
在下面的代码中,对这个问题有一个简单、高效且经过测试的答案。代码中有解释其中所有内容的注释。
我保证它并不像看起来那么可怕——它实际上只有13行代码!其余的都是注释、文档和断言
def split_including_delimiters(input: str, delimiter: str):
"""
Splits an input string, while including the delimiters in the output
Unlike str.split, we can use an empty string as a delimiter
Unlike str.split, the output will not have any extra empty strings
Conequently, len(''.split(delimiter))== 0 for all delimiters,
whereas len(input.split(delimiter))>0 for all inputs and delimiters
INPUTS:
input: Can be any string
delimiter: Can be any string
EXAMPLES:
>>> split_and_keep_delimiter('Hello World ! ',' ')
ans = ['Hello ', 'World ', ' ', '! ', ' ']
>>> split_and_keep_delimiter("Hello**World**!***", "**")
ans = ['Hello', '**', 'World', '**', '!', '**', '*']
EXAMPLES:
assert split_and_keep_delimiter('-xx-xx-','xx') == ['-', 'xx', '-', 'xx', '-'] # length 5
assert split_and_keep_delimiter('xx-xx-' ,'xx') == ['xx', '-', 'xx', '-'] # length 4
assert split_and_keep_delimiter('-xx-xx' ,'xx') == ['-', 'xx', '-', 'xx'] # length 4
assert split_and_keep_delimiter('xx-xx' ,'xx') == ['xx', '-', 'xx'] # length 3
assert split_and_keep_delimiter('xxxx' ,'xx') == ['xx', 'xx'] # length 2
assert split_and_keep_delimiter('xxx' ,'xx') == ['xx', 'x'] # length 2
assert split_and_keep_delimiter('x' ,'xx') == ['x'] # length 1
assert split_and_keep_delimiter('' ,'xx') == [] # length 0
assert split_and_keep_delimiter('aaa' ,'xx') == ['aaa'] # length 1
assert split_and_keep_delimiter('aa' ,'xx') == ['aa'] # length 1
assert split_and_keep_delimiter('a' ,'xx') == ['a'] # length 1
assert split_and_keep_delimiter('' ,'' ) == [] # length 0
assert split_and_keep_delimiter('a' ,'' ) == ['a'] # length 1
assert split_and_keep_delimiter('aa' ,'' ) == ['a', '', 'a'] # length 3
assert split_and_keep_delimiter('aaa' ,'' ) == ['a', '', 'a', '', 'a'] # length 5
"""
# Input assertions
assert isinstance(input,str), "input must be a string"
assert isinstance(delimiter,str), "delimiter must be a string"
if delimiter:
# These tokens do not include the delimiter, but are computed quickly
tokens = input.split(delimiter)
else:
# Edge case: if the delimiter is the empty string, split between the characters
tokens = list(input)
# The following assertions are always true for any string input and delimiter
# For speed's sake, we disable this assertion
# assert delimiter.join(tokens) == input
output = tokens[:1]
for token in tokens[1:]:
output.append(delimiter)
if token:
output.append(token)
# Don't let the first element be an empty string
if output[:1]==['']:
del output[0]
# The only case where we should have an empty string in the output is if it is our delimiter
# For speed's sake, we disable this assertion
# assert delimiter=='' or '' not in output
# The resulting strings should be combinable back into the original string
# For speed's sake, we disable this assertion
# assert ''.join(output) == input
return output
一个懒惰而简单的解决方案
假设你的正则表达式模式是split_pattern = r'(!|\?)'
首先,添加一些与新分隔符相同的字符,如'[cut]'
New_string = re.sub(split_pattern, '\\1[cut]', your_string)
然后拆分新的分隔符new_string.split('[cut]')