似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
当前回答
试试那个:)
regex.sub(r'\p{P}','', s)
其他回答
考虑unicode。代码已在python3中检查。
from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))
正则表达式很简单,如果你知道的话。
import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
with open('one.txt','r')as myFile:
str1=myFile.read()
print(str1)
punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]
for i in punctuation:
str1 = str1.replace(i," ")
myList=[]
myList.extend(str1.split(" "))
print (str1)
for i in myList:
print(i,end='\n')
print ("____________")
字符串标点符号仅为ASCII!一种更正确(但也慢得多)的方法是使用unicodedata模块:
# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with - «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s
您还可以概括和剥离其他类型的字符:
''.join(ch for ch in s if category(ch)[0] not in 'SP')
它还将删除~*+§$等字符,这些字符可能是“标点符号”,也可能不是“标点符号。
为了方便使用,我总结了Python 2和Python 3中从字符串中删除标点符号的注意事项。有关详细说明,请参阅其他答案。
Python 2
import string
s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation) # Output: string without punctuation
Python 3
import string
s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation)) # OR {key: None for key in string.punctuation}
new_s = s.translate(table) # Output: string without punctuation