在Python中，用空格分隔字符串——保留带引号的子字符串

我有一个这样的字符串:

this is "a test"

我试图在Python中写一些东西，通过空格分割它，同时忽略引号中的空格。我想要的结果是:

['this', 'is', 'a test']

PS，我知道你会问“如果引号中有引号会发生什么，在我的应用程序中，这永远不会发生。

当前回答

嗯，好像找不到“回复”按钮了…不管怎样，这个答案是基于Kate的方法，但正确地分割了包含转义引号的子字符串，并删除了子字符串的开始和结束引号:

  [i.strip('"').strip("'") for i in re.split(r'(\s+|(?<!\\)".*?(?<!\\)"|(?<!\\)\'.*?(?<!\\)\')', string) if i.strip()]

这适用于'这是' a \\\"test\\\"\\\'s子字符串' '这样的字符串(不幸的是，为了防止Python删除转义，这个疯狂的标记是必要的)。

如果返回列表中的字符串不需要转义，您可以使用稍微修改过的函数版本:

[i.strip('"').strip("'").decode('string_escape') for i in re.split(r'(\s+|(?<!\\)".*?(?<!\\)"|(?<!\\)\'.*?(?<!\\)\')', string) if i.strip()]

2010-01-29 01:36:23

其他回答

由于这个问题带有正则表达式，我决定尝试正则表达式方法。我首先将引号部分中的所有空格替换为\x00，然后按空格分割，然后将\x00替换回每个部分中的空格。

这两个版本都做同样的事情，但是splitter比splitter2更具可读性。

import re

s = 'this is "a test" some text "another test"'

def splitter(s):
    def replacer(m):
        return m.group(0).replace(" ", "\x00")
    parts = re.sub('".+?"', replacer, s).split()
    parts = [p.replace("\x00", " ") for p in parts]
    return parts

def splitter2(s):
    return [p.replace("\x00", " ") for p in re.sub('".+?"', lambda m: m.group(0).replace(" ", "\x00"), s).split()]

print splitter2(s)

2008-09-17 06:08:38

我用shlex。拆分来处理七千万行鱿鱼日志，太慢了。所以我换成了re。

请尝试这个，如果你有性能问题与shlex。

import re

def line_split(line):
    return re.findall(r'[^"\s]\S*|".+?"', line)

2014-04-18 13:29:10

由于性能原因，re似乎更快。下面是我的解决方案，使用最小贪婪操作符，保留外部引号:

re.findall("(?:\".*?\"|\S)+", s)

结果:

['this', 'is', '"a test"']

它将像aaa“bla blub”bbb这样的结构放在一起，因为这些标记没有被空格分隔。如果字符串包含转义字符，你可以这样匹配:

>>> a = "She said \"He said, \\\"My name is Mark.\\\"\""
>>> a
'She said "He said, \\"My name is Mark.\\""'
>>> for i in re.findall("(?:\".*?[^\\\\]\"|\S)+", a): print(i)
...
She
said
"He said, \"My name is Mark.\""

请注意，这也通过模式的\S部分来匹配空字符串“”。

2018-11-08 15:21:24

你需要从内置的shlex模块中分离。

>>> import shlex
>>> shlex.split('this is "a test"')
['this', 'is', 'a test']

这应该是你想要的效果。

如果你想保留引号，那么你可以传递posix=False kwarg。

>>> shlex.split('this is "a test"', posix=False)
['this', 'is', '"a test"']

2008-09-17 04:27:32

看一下shlex模块，特别是shlex.split。

>>> import shlex
>>> shlex.split('This is "a test"')
['This', 'is', 'a test']

2008-09-17 04:27:59

在Python中，用空格分隔字符串——保留带引号的子字符串

推荐文章

最新文章

标签