是否有一个Python函数将从字符串中删除空白(空格和制表符)?
所以给定的输入" \t example string\t "变成了"example string"
是否有一个Python函数将从字符串中删除空白(空格和制表符)?
所以给定的输入" \t example string\t "变成了"example string"
前导空格和尾随空格:
s = ' foo \t '
print s.strip() # prints "foo"
否则,正则表达式工作:
import re
pat = re.compile(r'\s+')
s = ' \t foo \t bar \t '
print pat.sub('', s) # prints "foobar"
对于两边的空格,使用str.strip:
s = " \t a string example\t "
s = s.strip()
对于右边的空白,使用str.rstrip:
s = s.rstrip()
对于左边的空白,使用str.lstrip:
s = s.lstrip()
你可以提供一个参数将任意字符剥离到这些函数中,如下所示:
s = s.strip(' \t\n\r')
这将从字符串两侧剥离任何空格、\t、\n或\r字符。
上面的例子只从字符串的左边和右边删除字符串。如果你也想从字符串中间删除字符,请尝试re.sub:
import re
print(re.sub('[\s+]', '', s))
这应该打印出来:
astringexample
#how to trim a multi line string or a file
s=""" line one
\tline two\t
line three """
#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.
s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']
print [i.strip() for i in s1]
['line one', 'line two', 'line three']
#more details:
#we could also have used a forloop from the begining:
for line in s.splitlines():
line=line.strip()
process(line)
#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
line=line.strip()
process(line)
#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']
在Python中,trim方法被命名为strip:
str.strip() # trim
str.lstrip() # left trim
str.rstrip() # right trim
还没有人发布这些正则表达式的解决方案。
匹配:
>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')
>>> m=p.match(' \t blah ')
>>> m.group(1)
'blah'
>>> m=p.match(' \tbl ah \t ')
>>> m.group(1)
'bl ah'
>>> m=p.match(' \t ')
>>> print m.group(1)
None
搜索(你必须处理“只有空格”输入大小写不同):
>>> p1=re.compile('\\S.*\\S')
>>> m=p1.search(' \tblah \t ')
>>> m.group()
'blah'
>>> m=p1.search(' \tbl ah \t ')
>>> m.group()
'bl ah'
>>> m=p1.search(' \t ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
如果使用re.sub,可能会删除内部空白,这可能是不可取的。
你也可以使用非常简单的基本函数:str.replace(),用于空格和制表符:
>>> whitespaces = " abcd ef gh ijkl "
>>> tabs = " abcde fgh ijkl"
>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl
简单易行。
试着翻译
>>> import string
>>> print '\t\r\n hello \r\n world \t\r\n'
hello
world
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> '\t\r\n hello \r\n world \t\r\n'.translate(tr)
' hello world '
>>> '\t\r\n hello \r\n world \t\r\n'.translate(tr).replace(' ', '')
'helloworld'
something = "\t please_ \t remove_ all_ \n\n\n\nwhitespaces\n\t "
something = "".join(something.split())
输出:
please_remove_all_whitespaces
将Le Droid的评论添加到答案中。 用空格隔开:
something = "\t please \t remove all extra \n\n\n\nwhitespaces\n\t "
something = " ".join(something.split())
输出:
请删除所有额外的空格
一般来说,我使用的方法如下:
>>> myStr = "Hi\n Stack Over \r flow!"
>>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]
>>> import re
>>> for i in charList:
myStr = re.sub(i, r"", myStr)
>>> myStr
'Hi Stack Over flow'
注意:这只用于删除“\n”,“\r”和“\t”。它不会删除额外的空格。
空格包括空格、制表符和CRLF。我们可以使用一个优雅的单行字符串函数翻译。
你好。翻译(不,不,不)
或者,如果你想彻底一点
import string
' hello apple'.translate(None, string.whitespace)
这将删除字符串开头和结尾的所有空格和换行符:
>>> s = " \n\t \n some \n text \n "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"
(re.sub(' +', ' ',(my_str。取代(' \ n ',' ')))). 带()
这将删除所有不需要的空格和换行符。希望这对你有所帮助
import re
my_str = ' a b \n c '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()
这将导致:
'a b \n c'将被更改为'a bc '
如果使用Python 3:在打印语句中,以sep=""结束。这样就把所有的空格分开了。
例子:
txt="potatoes"
print("I love ",txt,"",sep="")
这将打印: 我喜欢土豆。
而不是: 我喜欢土豆。
在您的情况下,由于您将试图获得\t,执行sep="\t"
如果你想去掉字符串开头和结尾的空白,你可以这样做:
some_string = " Hello, world!\n "
new_string = some_string.strip()
# new_string is now "Hello, world!"
这很像Qt的QString::()方法,因为它删除了开头和结尾的空白,而只保留内部空白。
但是,如果你想使用Qt的QString::simplified()方法,它不仅删除开头和结尾的空白,而且还将所有连续的内部空白“压缩”为一个空格字符,你可以使用.split()和" "的组合。像这样加入:
some_string = "\t Hello, \n\t world!\n "
new_string = " ".join(some_string.split())
# new_string is now "Hello, world!"
在最后一个示例中,每个内部空格序列都被替换为单个空格,同时仍然将字符串的开头和结尾的空白删除。
在以不同程度的理解看了相当多的解决方案之后,我想知道如果字符串是逗号分隔的该怎么办……
这个问题
在尝试处理一个csv的联系信息时,我需要解决这个问题:删除无关的空格和一些垃圾,但保留后面的逗号和内部空格。使用一个包含联系人注释的字段,我想删除垃圾,留下好的东西。修剪掉所有的标点符号和杂物,我不想失去复合标记之间的空白,因为我不想以后重新构建。
正则表达式和模式:[\s_]+?\ W +
该模式以[\s_]+?出现在从1到无限时间的非单词字符之前,使用这个:\W+(相当于[^a-zA-Z0-9_])。具体来说,它可以找到大量的空白:空字符(\0)、制表符(\t)、换行符(\n)、前馈(\f)、回车符(\r)。
我认为这样做有两个好处:
它不会删除你可能想要放在一起的完整单词/标记之间的空白; Python内置的字符串方法strip()不处理字符串内部,只处理左右两端,默认的arg是空字符(参见下面的示例:文本中有几个换行符,strip()不会删除它们,而regex模式会删除它们)。文本。带(' t \ r \ n \ ')
这超出了OPs的问题,但我认为在文本数据中有很多情况下我们可能会遇到奇怪的、病态的实例,就像我所做的那样(一些转义字符最终出现在一些文本中)。此外,在类似列表的字符串中,我们不希望消除分隔符,除非分隔符分隔了两个空白字符或一些非单词字符,如'- '或'-,,,,'。
注意:不是在谈论CSV本身的分隔符。仅适用于CSV中数据类似列表的实例,即子字符串组成的c.s.字符串。
Full disclosure: I've only been manipulating text for about a month, and regex only the last two weeks, so I'm sure there are some nuances I'm missing. That said, for smaller collections of strings (mine are in a dataframe of 12,000 rows and 40 odd columns), as a final step after a pass for removal of extraneous characters, this works exceptionally well, especially if you introduce some additional whitespace where you want to separate text joined by a non-word character, but don't want to add whitespace where there was none before.
一个例子:
import re
text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , , dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109 \n\n\n\nklkjsdf\""
print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r') # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")
print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)
这个输出:
Here is the text as formatted:
"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , , dd invites,subscribed, ,, , , ff dd
invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s,
i69rpofhfsp9t7c practice 20ignition - 20june
.2134.pdf 2109
klkjsdf"
using regex to trim both the whitespaces and the non-word characters that follow them.
"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"
Very nice.
What about 'strip()'?
Here is the text, formatted as is:
"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , , dd invites,subscribed, ,, , , ff dd
invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s,
i69rpofhfsp9t7c practice 20ignition - 20june
.2134.pdf 2109
klkjsdf"
Here is the text, after stipping with 'strip':
"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , , dd invites,subscribed, ,, , , ff dd
invites, subscribed, , , , , alumni spring 2012 deck: https: www.dropbox.com s,
i69rpofhfsp9t7c practice 20ignition - 20june
.2134.pdf 2109
klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'
所以strip一次删除一个空白。所以在OPs的情况下,strip()是可以的。但如果事情变得更复杂,regex和类似的模式可能对更一般的设置有一定的价值。
看看它的实际应用