似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
当前回答
考虑unicode。代码已在python3中检查。
from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))
其他回答
在处理Unicode字符串时,我建议使用PyPi正则表达式模块,因为它同时支持Unicode属性类(如\p{X}/\p{X})和POSIX字符类(如[:name:])。
只需在终端中键入pipinstallregex(或pip3installregex)并按回车键即可安装软件包。
如果您需要删除任何类型的标点符号(即除字母、数字和空格之外的任何其他符号),您可以使用
regex.sub(r'[\p{P}\p{S}]', '', text) # to remove one by one
regex.sub(r'[\p{P}\p{S}]+', '', text) # to remove all consecutive punctuation/symbols with one go
regex.sub(r'[[:punct:]]+', '', text) # Same with a POSIX character class
在线观看Python演示:
import regex
text = 'भारत India <><>^$.,,! 002'
new_text = regex.sub(r'[\p{P}\p{S}\s]+', ' ', text).lower().strip()
# OR
# new_text = regex.sub(r'[[:punct:]\s]+', ' ', text).lower().strip()
print(new_text)
# => भारत india 002
在这里,我向字符类添加了空白模式
不一定更简单,但如果你更熟悉re家族的话,就另辟蹊径。
import re, string
s = "string. With. Punctuation?" # Sample string
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)
# FIRST METHOD
# Storing all punctuations in a variable
punctuation='!?,.:;"\')(_-'
newstring ='' # Creating empty string
word = raw_input("Enter string: ")
for i in word:
if(i not in punctuation):
newstring += i
print ("The string without punctuation is", newstring)
# SECOND METHOD
word = raw_input("Enter string: ")
punctuation = '!?,.:;"\')(_-'
newstring = word.translate(None, punctuation)
print ("The string without punctuation is",newstring)
# Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage
我通常用这样的词:
>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
... s= s.replace(c,"")
...
>>> s
'string With Punctuation'
对于Python 3 str或Python 2 unicode值,str.translate()只使用字典;在该映射中查找代码点(整数),并删除映射到None的任何内容。
要删除(某些?)标点符号,请使用:
import string
remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)
dict.fromkeys()类方法使创建映射变得简单,根据键的顺序将所有值设置为None。
要删除所有标点符号,而不仅仅是ASCII标点符号,您的表需要稍微大一点;参见J.F.Sebastian的答案(Python 3版本):
import unicodedata
import sys
remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
if unicodedata.category(chr(i)).startswith('P'))