似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
当前回答
对于Python 3 str或Python 2 unicode值,str.translate()只使用字典;在该映射中查找代码点(整数),并删除映射到None的任何内容。
要删除(某些?)标点符号,请使用:
import string
remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)
dict.fromkeys()类方法使创建映射变得简单,根据键的顺序将所有值设置为None。
要删除所有标点符号,而不仅仅是ASCII标点符号,您的表需要稍微大一点;参见J.F.Sebastian的答案(Python 3版本):
import unicodedata
import sys
remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
if unicodedata.category(chr(i)).startswith('P'))
其他回答
我还没有看到这个答案。只需使用正则表达式;它删除了除单词字符(\w)和数字字符(\d)之外的所有字符,后跟一个空白字符(\s):
import re
s = "string. With. Punctuation?" # Sample string
out = re.sub(ur'[^\w\d\s]+', '', s)
# FIRST METHOD
# Storing all punctuations in a variable
punctuation='!?,.:;"\')(_-'
newstring ='' # Creating empty string
word = raw_input("Enter string: ")
for i in word:
if(i not in punctuation):
newstring += i
print ("The string without punctuation is", newstring)
# SECOND METHOD
word = raw_input("Enter string: ")
punctuation = '!?,.:;"\')(_-'
newstring = word.translate(None, punctuation)
print ("The string without punctuation is",newstring)
# Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage
这里有一个使用RegEx的简单方法
import re
punct = re.compile(r'(\w+)')
sentence = 'This ! is : a # sample $ sentence.' # Text with punctuation
tokenized = [m.group() for m in punct.finditer(sentence)]
sentence = ' '.join(tokenized)
print(sentence)
'This is a sample sentence'
我喜欢使用这样的函数:
def scrub(abc):
while abc[-1] is in list(string.punctuation):
abc=abc[:-1]
while abc[0] is in list(string.punctuation):
abc=abc[1:]
return abc
import re
s = "string. With. Punctuation?" # Sample string
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)