我有一个这样的字符串:

this is "a test"

我试图在Python中写一些东西,通过空格分割它,同时忽略引号中的空格。我想要的结果是:

['this', 'is', 'a test']

PS,我知道你会问“如果引号中有引号会发生什么,在我的应用程序中,这永远不会发生。


当前回答

试试这个:

  def adamsplit(s):
    result = []
    inquotes = False
    for substring in s.split('"'):
      if not inquotes:
        result.extend(substring.split())
      else:
        result.append(substring)
      inquotes = not inquotes
    return result

一些测试字符串:

'This is "a test"' -> ['This', 'is', 'a test']
'"This is \'a test\'"' -> ["This is 'a test'"]

其他回答

由于性能原因,re似乎更快。下面是我的解决方案,使用最小贪婪操作符,保留外部引号:

re.findall("(?:\".*?\"|\S)+", s)

结果:

['this', 'is', '"a test"']

它将像aaa“bla blub”bbb这样的结构放在一起,因为这些标记没有被空格分隔。如果字符串包含转义字符,你可以这样匹配:

>>> a = "She said \"He said, \\\"My name is Mark.\\\"\""
>>> a
'She said "He said, \\"My name is Mark.\\""'
>>> for i in re.findall("(?:\".*?[^\\\\]\"|\S)+", a): print(i)
...
She
said
"He said, \"My name is Mark.\""

请注意,这也通过模式的\S部分来匹配空字符串“”。

根据你的用例,你可能还想检查csv模块:

import csv
lines = ['this is "a string"', 'and more "stuff"']
for row in csv.reader(lines, delimiter=" "):
    print(row)

输出:

['this', 'is', 'a string']
['and', 'more', 'stuff']

作为一个选项,尝试tssplit:

In [1]: from tssplit import tssplit
In [2]: tssplit('this is "a test"', quote='"', delimiter='')
Out[2]: ['this', 'is', 'a test']

看一下shlex模块,特别是shlex.split。

>>> import shlex
>>> shlex.split('This is "a test"')
['This', 'is', 'a test']

要保留引号,使用这个函数:

def getArgs(s):
    args = []
    cur = ''
    inQuotes = 0
    for char in s.strip():
        if char == ' ' and not inQuotes:
            args.append(cur)
            cur = ''
        elif char == '"' and not inQuotes:
            inQuotes = 1
            cur += char
        elif char == '"' and inQuotes:
            inQuotes = 0
            cur += char
        else:
            cur += char
    args.append(cur)
    return args