这是最简单的解释。这是我正在使用的:

re.split('\W', 'foo/bar spam\neggs')
>>> ['foo', 'bar', 'spam', 'eggs']

这是我想要的:

someMethod('\W', 'foo/bar spam\neggs')
>>> ['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

原因是我想把一个字符串分割成令牌,操作它,然后再把它组合在一起。


当前回答

安装wrs时“不拆卸分离器”

pip install wrs

(由Rao Hamza开发)

import wrs
text  = "Now inbox “how to make spam ad” Invest in hard email marketing."
splitor = 'email | spam | inbox'
list = wrs.wr_split(splitor, text)
print(list)

结果: ['现在','收件箱'如何制作','垃圾广告'努力投资','电子邮件营销'。]

其他回答

安装wrs时“不拆卸分离器”

pip install wrs

(由Rao Hamza开发)

import wrs
text  = "Now inbox “how to make spam ad” Invest in hard email marketing."
splitor = 'email | spam | inbox'
list = wrs.wr_split(splitor, text)
print(list)

结果: ['现在','收件箱'如何制作','垃圾广告'努力投资','电子邮件营销'。]

如果你只有一个分隔符,你可以使用列表推导式:

text = 'foo,bar,baz,qux'  
sep = ','

附加/将分隔符:

result = [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# to get rid of trailing
result[-1] = result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']

result = [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# to get rid of trailing
result[0] = result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']

分隔符作为它自己的元素:

result = [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results = result[:-1]   # to get rid of trailing

如果你在换行上分割,使用splitlines(True)。

>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']

(不是一个通用的解决方案,但在这里添加这个,以防有人来到这里没有意识到这个方法的存在。)

你也可以用字符串数组而不是正则表达式分割字符串,就像这样:

def tokenizeString(aString, separators):
    #separators is an array of strings that are being used to split the string.
    #sort separators in order of descending length
    separators.sort(key=len)
    listToReturn = []
    i = 0
    while i < len(aString):
        theSeparator = ""
        for current in separators:
            if current == aString[i:i+len(current)]:
                theSeparator = current
        if theSeparator != "":
            listToReturn += [theSeparator]
            i = i + len(theSeparator)
        else:
            if listToReturn == []:
                listToReturn = [""]
            if(listToReturn[-1] in separators):
                listToReturn += [""]
            listToReturn[-1] += aString[i]
            i += 1
    return listToReturn
    

print(tokenizeString(aString = "\"\"\"hi\"\"\" hello + world += (1*2+3/5) '''hi'''", separators = ["'''", '+=', '+', "/", "*", "\\'", '\\"', "-=", "-", " ", '"""', "(", ")"]))

之前发布的一些答案,会重复分隔符,或者有一些我在自己的情况下遇到的其他错误。你可以使用这个函数:

def split_and_keep_delimiter(input, delimiter):
    result      = list()
    idx         = 0
    while delimiter in input:
        idx     = input.index(delimiter);
        result.append(input[0:idx+len(delimiter)])
        input = input[idx+len(delimiter):]
    result.append(input)
    return result