这是最简单的解释。这是我正在使用的:
re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']
这是我想要的:
someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。
这是最简单的解释。这是我正在使用的:
re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']
这是我想要的:
someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。
当前回答
使用re.split,并且你的正则表达式来自变量,并且你有多个分隔符,你可以像下面这样使用:
# BashSpecialParamList is the special param in bash,
# such as your separator is the bash special param
BashSpecialParamList = ["$*", "$@", "$#", "$?", "$-", "$$", "$!", "$0"]
# aStr is the the string to be splited
aStr = "$a Klkjfd$0 $? $#%$*Sdfdf"
reStr = "|".join([re.escape(sepStr) for sepStr in BashSpecialParamList])
re.split(f'({reStr})', aStr)
# Then You can get the result:
# ['$a Klkjfd', '$0', ' ', '$?', ' ', '$#', '%', '$*', 'Sdfdf']
参考:GNU Bash特殊参数
其他回答
我发现这种基于生成器的方法更令人满意:
def split_keep(string, sep):
"""Usage:
>>> list(split_keep("a.b.c.d", "."))
['a.', 'b.', 'c.', 'd']
"""
start = 0
while True:
end = string.find(sep, start) + 1
if end == 0:
break
yield string[start:end]
start = end
yield string[start:]
它避免了需要找出正确的正则表达式,而在理论上应该相当便宜。它不创建新的字符串对象,并将大部分迭代工作委托给高效的find方法。
... 在Python 3.8中,它可以短到:
def split_keep(string, sep):
start = 0
while (end := string.find(sep, start) + 1) > 0:
yield string[start:end]
start = end
yield string[start:]
在下面的代码中,对这个问题有一个简单、高效且经过测试的答案。代码中有解释其中所有内容的注释。
我保证它并不像看起来那么可怕——它实际上只有13行代码!其余的都是注释、文档和断言
def split_including_delimiters(input: str, delimiter: str):
"""
Splits an input string, while including the delimiters in the output
Unlike str.split, we can use an empty string as a delimiter
Unlike str.split, the output will not have any extra empty strings
Conequently, len(''.split(delimiter))== 0 for all delimiters,
whereas len(input.split(delimiter))>0 for all inputs and delimiters
INPUTS:
input: Can be any string
delimiter: Can be any string
EXAMPLES:
>>> split_and_keep_delimiter('Hello World ! ',' ')
ans = ['Hello ', 'World ', ' ', '! ', ' ']
>>> split_and_keep_delimiter("Hello**World**!***", "**")
ans = ['Hello', '**', 'World', '**', '!', '**', '*']
EXAMPLES:
assert split_and_keep_delimiter('-xx-xx-','xx') == ['-', 'xx', '-', 'xx', '-'] # length 5
assert split_and_keep_delimiter('xx-xx-' ,'xx') == ['xx', '-', 'xx', '-'] # length 4
assert split_and_keep_delimiter('-xx-xx' ,'xx') == ['-', 'xx', '-', 'xx'] # length 4
assert split_and_keep_delimiter('xx-xx' ,'xx') == ['xx', '-', 'xx'] # length 3
assert split_and_keep_delimiter('xxxx' ,'xx') == ['xx', 'xx'] # length 2
assert split_and_keep_delimiter('xxx' ,'xx') == ['xx', 'x'] # length 2
assert split_and_keep_delimiter('x' ,'xx') == ['x'] # length 1
assert split_and_keep_delimiter('' ,'xx') == [] # length 0
assert split_and_keep_delimiter('aaa' ,'xx') == ['aaa'] # length 1
assert split_and_keep_delimiter('aa' ,'xx') == ['aa'] # length 1
assert split_and_keep_delimiter('a' ,'xx') == ['a'] # length 1
assert split_and_keep_delimiter('' ,'' ) == [] # length 0
assert split_and_keep_delimiter('a' ,'' ) == ['a'] # length 1
assert split_and_keep_delimiter('aa' ,'' ) == ['a', '', 'a'] # length 3
assert split_and_keep_delimiter('aaa' ,'' ) == ['a', '', 'a', '', 'a'] # length 5
"""
# Input assertions
assert isinstance(input,str), "input must be a string"
assert isinstance(delimiter,str), "delimiter must be a string"
if delimiter:
# These tokens do not include the delimiter, but are computed quickly
tokens = input.split(delimiter)
else:
# Edge case: if the delimiter is the empty string, split between the characters
tokens = list(input)
# The following assertions are always true for any string input and delimiter
# For speed's sake, we disable this assertion
# assert delimiter.join(tokens) == input
output = tokens[:1]
for token in tokens[1:]:
output.append(delimiter)
if token:
output.append(token)
# Don't let the first element be an empty string
if output[:1]==['']:
del output[0]
# The only case where we should have an empty string in the output is if it is our delimiter
# For speed's sake, we disable this assertion
# assert delimiter=='' or '' not in output
# The resulting strings should be combinable back into the original string
# For speed's sake, we disable this assertion
# assert ''.join(output) == input
return output
另一个例子,在非字母数字上进行分割,并保留分隔符
import re
a = "foo,bar@candy*ice%cream"
re.split('([^a-zA-Z0-9])',a)
输出:
['foo', ',', 'bar', '@', 'candy', '*', 'ice', '%', 'cream']
解释
re.split('([^a-zA-Z0-9])',a)
() <- keep the separators
[] <- match everything in between
^a-zA-Z0-9 <-except alphabets, upper/lower and numbers.
我可以把它放在这里吗
s = 'foo/bar spam\neggs'
print(s.replace('/', '+++/+++').replace(' ', '+++ +++').replace('\n', '+++\n+++').split('+++'))
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
如果你只有一个分隔符,你可以使用列表推导式:
text = 'foo,bar,baz,qux'
sep = ','
附加/将分隔符:
result = [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# to get rid of trailing
result[-1] = result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']
result = [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# to get rid of trailing
result[0] = result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']
分隔符作为它自己的元素:
result = [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results = result[:-1] # to get rid of trailing