似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
似乎应该有一种比以下更简单的方法:
import string
s = "string. With. Punctuation?" # Sample string
out = s.translate(string.maketrans("",""), string.punctuation)
有?
当前回答
这里有一个使用RegEx的简单方法
import re
punct = re.compile(r'(\w+)')
sentence = 'This ! is : a # sample $ sentence.' # Text with punctuation
tokenized = [m.group() for m in punct.finditer(sentence)]
sentence = ' '.join(tokenized)
print(sentence)
'This is a sample sentence'
其他回答
我在寻找一个非常简单的解决方案。这是我得到的:
import re
s = "string. With. Punctuation?"
s = re.sub(r'[\W\s]', ' ', s)
print(s)
'string With Punctuation '
这是我写的一个函数。它不是很有效,但很简单,您可以添加或删除任何您想要的标点符号:
def stripPunc(wordList):
"""Strips punctuation from list of words"""
puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
for punc in puncList:
for word in wordList:
wordList=[word.replace(punc,'') for word in wordList]
return wordList
字符串标点符号仅为ASCII!一种更正确(但也慢得多)的方法是使用unicodedata模块:
# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with - «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s
您还可以概括和剥离其他类型的字符:
''.join(ch for ch in s if category(ch)[0] not in 'SP')
它还将删除~*+§$等字符,这些字符可能是“标点符号”,也可能不是“标点符号。
考虑unicode。代码已在python3中检查。
from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))
myString.translate(None, string.punctuation)