在Python中从字符串中剥离除字母数字字符以外的所有内容

使用Python从字符串中剥离所有非字母数字字符的最佳方法是什么?

在这个问题的PHP变体中提出的解决方案可能会进行一些小的调整，但对我来说似乎不太“python化”。

声明一下，我不只是想去掉句号和逗号(以及其他标点符号)，还想去掉引号、括号等。

当前回答

Python 3

使用与@John Machin的答案相同的方法，但针对Python 3进行了更新:

更大的字符集对翻译工作方式的轻微改变。

现在假定Python代码是用UTF-8编码的 (来源:PEP 3120)

这意味着包含你想要删除的所有字符的字符串会变得更大:

    
del_chars = ''.join(c for c in map(chr, range(1114111)) if not c.isalnum())

翻译方法现在需要使用一个翻译表，我们可以用maketrans()创建:

    
del_map = str.maketrans('', '', del_chars)

现在，像以前一样，任何你想要“捏碎”的字符串:

    
scrunched = s.translate(del_map)

使用来自@Joe Machin的最后一个计时例子，我们可以看到它仍然比re强一个数量级:

    
> python -mtimeit -s"d=''.join(c for c in map(chr,range(1114111)) if not c.isalnum());m=str.maketrans('','',d);s='foo-'*25" "s.translate(m)"
    
1000000 loops, best of 5: 255 nsec per loop
    
> python -mtimeit -s"import re;s='foo-'*25;r=re.compile(r'[\W_]+')" "r.sub('',s)"
    
50000 loops, best of 5: 4.8 usec per loop

2021-12-10 20:24:34

其他回答

正则表达式的拯救:

import re
re.sub(r'\W+', '', your_string)

根据Python定义'\W == [^a-zA-Z0-9_]，它不包括所有数字、字母和_

2009-08-14 08:57:37

我只是出于好奇计算了一些函数的时间。在这些测试中，我从字符串string中删除非字母数字字符。Printable(内置字符串模块的一部分)。使用编译的'[\W_]+'和模式。Sub ("， str)被发现是最快的。

$ python -m timeit -s \
     "import string" \
     "''.join(ch for ch in string.printable if ch.isalnum())" 
10000 loops, best of 3: 57.6 usec per loop

$ python -m timeit -s \
    "import string" \
    "filter(str.isalnum, string.printable)"                 
10000 loops, best of 3: 37.9 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]', '', string.printable)"
10000 loops, best of 3: 27.5 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]+', '', string.printable)"                
100000 loops, best of 3: 15 usec per loop

$ python -m timeit -s \
    "import re, string; pattern = re.compile('[\W_]+')" \
    "pattern.sub('', string.printable)" 
100000 loops, best of 3: 11.2 usec per loop

2009-08-14 10:03:32

对于简单的一行代码(Python 3.0):

''.join(filter( lambda x: x in '0123456789abcdefghijklmnopqrstuvwxyz', the_string_you_want_stripped ))

对于Python < 3.0:

filter( lambda x: x in '0123456789abcdefghijklmnopqrstuvwxyz', the_string_you_want_stripped )

注意:如果需要，您可以将其他字符添加到允许字符列表中(例如:“0123456789 abcdefghijklmnopqrstuvwxyz。_”)。

2021-07-08 23:01:41

这是一个简单的解决方案，因为这里所有的答案都很复杂

filtered = ''
for c in unfiltered:
    if str.isalnum(c):
        filtered += c
    
print(filtered)

2022-01-07 11:44:16

for char in my_string:
    if not char.isalnum():
        my_string = my_string.replace(char,"")

2018-10-27 06:36:59

在Python中从字符串中剥离除字母数字字符以外的所有内容

推荐文章

最新文章

标签