我如何修剪空白?

是否有一个Python函数将从字符串中删除空白(空格和制表符)?

所以给定的输入" \t example string\t "变成了"example string"

当前回答

还没有人发布这些正则表达式的解决方案。

匹配:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

搜索(你必须处理“只有空格”输入大小写不同):

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

如果使用re.sub，可能会删除内部空白，这可能是不可取的。

2013-02-12 02:22:02

其他回答

还没有人发布这些正则表达式的解决方案。

匹配:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

搜索(你必须处理“只有空格”输入大小写不同):

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

如果使用re.sub，可能会删除内部空白，这可能是不可取的。

2013-02-12 02:22:02

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

输出:

please_remove_all_whitespaces

将Le Droid的评论添加到答案中。用空格隔开:

    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

输出:

请删除所有额外的空格

2015-06-19 02:58:48

这将删除字符串开头和结尾的所有空格和换行符:

>>> s = "  \n\t  \n   some \n text \n     "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"

2017-07-12 20:22:13

前导空格和尾随空格:

s = '   foo    \t   '
print s.strip() # prints "foo"

否则，正则表达式工作:

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

2009-07-26 20:56:06

在以不同程度的理解看了相当多的解决方案之后，我想知道如果字符串是逗号分隔的该怎么办……

这个问题

在尝试处理一个csv的联系信息时，我需要解决这个问题:删除无关的空格和一些垃圾，但保留后面的逗号和内部空格。使用一个包含联系人注释的字段，我想删除垃圾，留下好的东西。修剪掉所有的标点符号和杂物，我不想失去复合标记之间的空白，因为我不想以后重新构建。

正则表达式和模式:[\s_]+?\ W +

该模式以[\s_]+?出现在从1到无限时间的非单词字符之前，使用这个:\W+(相当于[^a-zA-Z0-9_])。具体来说，它可以找到大量的空白:空字符(\0)、制表符(\t)、换行符(\n)、前馈(\f)、回车符(\r)。

我认为这样做有两个好处:

它不会删除你可能想要放在一起的完整单词/标记之间的空白; Python内置的字符串方法strip()不处理字符串内部，只处理左右两端，默认的arg是空字符(参见下面的示例:文本中有几个换行符，strip()不会删除它们，而regex模式会删除它们)。文本。带(' t \ r \ n \ ')

这超出了OPs的问题，但我认为在文本数据中有很多情况下我们可能会遇到奇怪的、病态的实例，就像我所做的那样(一些转义字符最终出现在一些文本中)。此外，在类似列表的字符串中，我们不希望消除分隔符，除非分隔符分隔了两个空白字符或一些非单词字符，如'- '或'-，，，，'。

注意:不是在谈论CSV本身的分隔符。仅适用于CSV中数据类似列表的实例，即子字符串组成的c.s.字符串。

Full disclosure: I've only been manipulating text for about a month, and regex only the last two weeks, so I'm sure there are some nuances I'm missing. That said, for smaller collections of strings (mine are in a dataframe of 12,000 rows and 40 odd columns), as a final step after a pass for removal of extraneous characters, this works exceptionally well, especially if you introduce some additional whitespace where you want to separate text joined by a non-word character, but don't want to add whitespace where there was none before.

一个例子:

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

这个输出:

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

所以strip一次删除一个空白。所以在OPs的情况下，strip()是可以的。但如果事情变得更复杂，regex和类似的模式可能对更一般的设置有一定的价值。

看看它的实际应用

2020-04-18 16:47:55

我如何修剪空白?

推荐文章

最新文章

标签