在Python中从字符串中剥离除字母数字字符以外的所有内容

使用Python从字符串中剥离所有非字母数字字符的最佳方法是什么?

在这个问题的PHP变体中提出的解决方案可能会进行一些小的调整，但对我来说似乎不太“python化”。

声明一下，我不只是想去掉句号和逗号(以及其他标点符号)，还想去掉引号、括号等。

当前回答

如果我理解正确，最简单的方法是使用正则表达式，因为它为您提供了很大的灵活性，但另一个简单的方法是使用循环以下是示例代码，我还计算了单词的出现并存储在字典中。

s = """An... essay is, generally, a piece of writing that gives the author's own 
argument — but the definition is vague, 
overlapping with those of a paper, an article, a pamphlet, and a short story. Essays 
have traditionally been 
sub-classified as formal and informal. Formal essays are characterized by "serious 
purpose, dignity, logical 
organization, length," whereas the informal essay is characterized by "the personal 
element (self-revelation, 
individual tastes and experiences, confidential manner), humor, graceful style, 
rambling structure, unconventionality 
or novelty of theme," etc.[1]"""

d = {}      # creating empty dic      
words = s.split() # spliting string and stroing in list
for word in words:
    new_word = ''
    for c in word:
        if c.isalnum(): # checking if indiviual chr is alphanumeric or not
            new_word = new_word + c
    print(new_word, end=' ')
    # if new_word not in d:
    #     d[new_word] = 1
    # else:
    #     d[new_word] = d[new_word] +1
print(d)

如果这个答案是有用的，请评价这个!

2020-04-11 16:36:44

其他回答

作为这里其他一些答案的衍生，我提供了一种非常简单而灵活的方法来定义您想要限制字符串内容的一组字符。在这种情况下，我允许字母数字加破折号和下划线。只需添加或删除字符从我的PERMITTED_CHARS适合您的用例。

PERMITTED_CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-" 
someString = "".join(c for c in someString if c in PERMITTED_CHARS)

2017-03-13 12:54:22

对于简单的一行代码(Python 3.0):

''.join(filter( lambda x: x in '0123456789abcdefghijklmnopqrstuvwxyz', the_string_you_want_stripped ))

对于Python < 3.0:

filter( lambda x: x in '0123456789abcdefghijklmnopqrstuvwxyz', the_string_you_want_stripped )

注意:如果需要，您可以将其他字符添加到允许字符列表中(例如:“0123456789 abcdefghijklmnopqrstuvwxyz。_”)。

2021-07-08 23:01:41

正则表达式的拯救:

import re
re.sub(r'\W+', '', your_string)

根据Python定义'\W == [^a-zA-Z0-9_]，它不包括所有数字、字母和_

2009-08-14 08:57:37

你可以试试:

print ''.join(ch for ch in some_string if ch.isalnum())

2009-08-14 09:02:28

我只是出于好奇计算了一些函数的时间。在这些测试中，我从字符串string中删除非字母数字字符。Printable(内置字符串模块的一部分)。使用编译的'[\W_]+'和模式。Sub ("， str)被发现是最快的。

$ python -m timeit -s \
     "import string" \
     "''.join(ch for ch in string.printable if ch.isalnum())" 
10000 loops, best of 3: 57.6 usec per loop

$ python -m timeit -s \
    "import string" \
    "filter(str.isalnum, string.printable)"                 
10000 loops, best of 3: 37.9 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]', '', string.printable)"
10000 loops, best of 3: 27.5 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]+', '', string.printable)"                
100000 loops, best of 3: 15 usec per loop

$ python -m timeit -s \
    "import re, string; pattern = re.compile('[\W_]+')" \
    "pattern.sub('', string.printable)" 
100000 loops, best of 3: 11.2 usec per loop

2009-08-14 10:03:32

在Python中从字符串中剥离除字母数字字符以外的所有内容

推荐文章

最新文章

标签