使用Python从字符串中剥离所有非字母数字字符的最佳方法是什么?
在这个问题的PHP变体中提出的解决方案可能会进行一些小的调整,但对我来说似乎不太“python化”。
声明一下,我不只是想去掉句号和逗号(以及其他标点符号),还想去掉引号、括号等。
使用Python从字符串中剥离所有非字母数字字符的最佳方法是什么?
在这个问题的PHP变体中提出的解决方案可能会进行一些小的调整,但对我来说似乎不太“python化”。
声明一下,我不只是想去掉句号和逗号(以及其他标点符号),还想去掉引号、括号等。
当前回答
sent = "".join(e for e in sent if e.isalpha())
其他回答
我只是出于好奇计算了一些函数的时间。在这些测试中,我从字符串string中删除非字母数字字符。Printable(内置字符串模块的一部分)。使用编译的'[\W_]+'和模式。Sub (", str)被发现是最快的。
$ python -m timeit -s \
"import string" \
"''.join(ch for ch in string.printable if ch.isalnum())"
10000 loops, best of 3: 57.6 usec per loop
$ python -m timeit -s \
"import string" \
"filter(str.isalnum, string.printable)"
10000 loops, best of 3: 37.9 usec per loop
$ python -m timeit -s \
"import re, string" \
"re.sub('[\W_]', '', string.printable)"
10000 loops, best of 3: 27.5 usec per loop
$ python -m timeit -s \
"import re, string" \
"re.sub('[\W_]+', '', string.printable)"
100000 loops, best of 3: 15 usec per loop
$ python -m timeit -s \
"import re, string; pattern = re.compile('[\W_]+')" \
"pattern.sub('', string.printable)"
100000 loops, best of 3: 11.2 usec per loop
这是一个简单的解决方案,因为这里所有的答案都很复杂
filtered = ''
for c in unfiltered:
if str.isalnum(c):
filtered += c
print(filtered)
如果你想保留像áéíóúãẽĩõũ这样的字符,使用这个:
import re
re.sub('[\W\d_]+', '', your_string)
你可以试试:
print ''.join(ch for ch in some_string if ch.isalnum())
如果我理解正确,最简单的方法是使用正则表达式,因为它为您提供了很大的灵活性,但另一个简单的方法是使用循环以下是示例代码,我还计算了单词的出现并存储在字典中。
s = """An... essay is, generally, a piece of writing that gives the author's own
argument — but the definition is vague,
overlapping with those of a paper, an article, a pamphlet, and a short story. Essays
have traditionally been
sub-classified as formal and informal. Formal essays are characterized by "serious
purpose, dignity, logical
organization, length," whereas the informal essay is characterized by "the personal
element (self-revelation,
individual tastes and experiences, confidential manner), humor, graceful style,
rambling structure, unconventionality
or novelty of theme," etc.[1]"""
d = {} # creating empty dic
words = s.split() # spliting string and stroing in list
for word in words:
new_word = ''
for c in word:
if c.isalnum(): # checking if indiviual chr is alphanumeric or not
new_word = new_word + c
print(new_word, end=' ')
# if new_word not in d:
# d[new_word] = 1
# else:
# d[new_word] = d[new_word] +1
print(d)
如果这个答案是有用的,请评价这个!