在Python中从字符串中剥离除字母数字字符以外的所有内容

使用Python从字符串中剥离所有非字母数字字符的最佳方法是什么?

在这个问题的PHP变体中提出的解决方案可能会进行一些小的调整，但对我来说似乎不太“python化”。

声明一下，我不只是想去掉句号和逗号(以及其他标点符号)，还想去掉引号、括号等。

当前回答

正则表达式的拯救:

import re
re.sub(r'\W+', '', your_string)

根据Python定义'\W == [^a-zA-Z0-9_]，它不包括所有数字、字母和_

2009-08-14 08:57:37

其他回答

如果你想保留像áéíóúãẽĩõũ这样的字符，使用这个:

import re
re.sub('[\W\d_]+', '', your_string)

2022-06-02 14:32:29

我用perfplot(我的一个项目)检查了结果，发现对于短字符串，

"".join(filter(str.isalnum, s))

是最快的。对于长字符串(200+字符)

re.sub("[\W_]", "", s)

是最快的。

代码重现情节:

import perfplot
import random
import re
import string

pattern = re.compile("[\W_]+")


def setup(n):
    return "".join(random.choices(string.ascii_letters + string.digits, k=n))


def string_alphanum(s):
    return "".join(ch for ch in s if ch.isalnum())


def filter_str(s):
    return "".join(filter(str.isalnum, s))


def re_sub1(s):
    return re.sub("[\W_]", "", s)


def re_sub2(s):
    return re.sub("[\W_]+", "", s)


def re_sub3(s):
    return pattern.sub("", s)


b = perfplot.bench(
    setup=setup,
    kernels=[string_alphanum, filter_str, re_sub1, re_sub2, re_sub3],
    n_range=[2**k for k in range(10)],
)
b.save("out.png")
b.show()

2022-03-03 16:07:21

>>> import re
>>> string = "Kl13@£$%[};'\""
>>> pattern = re.compile('\W')
>>> string = re.sub(pattern, '', string)
>>> print string
Kl13

2009-08-14 09:01:22

对于简单的一行代码(Python 3.0):

''.join(filter( lambda x: x in '0123456789abcdefghijklmnopqrstuvwxyz', the_string_you_want_stripped ))

对于Python < 3.0:

filter( lambda x: x in '0123456789abcdefghijklmnopqrstuvwxyz', the_string_you_want_stripped )

注意:如果需要，您可以将其他字符添加到允许字符列表中(例如:“0123456789 abcdefghijklmnopqrstuvwxyz。_”)。

2021-07-08 23:01:41

如果我理解正确，最简单的方法是使用正则表达式，因为它为您提供了很大的灵活性，但另一个简单的方法是使用循环以下是示例代码，我还计算了单词的出现并存储在字典中。

s = """An... essay is, generally, a piece of writing that gives the author's own 
argument — but the definition is vague, 
overlapping with those of a paper, an article, a pamphlet, and a short story. Essays 
have traditionally been 
sub-classified as formal and informal. Formal essays are characterized by "serious 
purpose, dignity, logical 
organization, length," whereas the informal essay is characterized by "the personal 
element (self-revelation, 
individual tastes and experiences, confidential manner), humor, graceful style, 
rambling structure, unconventionality 
or novelty of theme," etc.[1]"""

d = {}      # creating empty dic      
words = s.split() # spliting string and stroing in list
for word in words:
    new_word = ''
    for c in word:
        if c.isalnum(): # checking if indiviual chr is alphanumeric or not
            new_word = new_word + c
    print(new_word, end=' ')
    # if new_word not in d:
    #     d[new_word] = 1
    # else:
    #     d[new_word] = d[new_word] +1
print(d)

如果这个答案是有用的，请评价这个!

2020-04-11 16:36:44

在Python中从字符串中剥离除字母数字字符以外的所有内容

推荐文章

最新文章

标签