从字符串中删除标点符号的最佳方法

似乎应该有一种比以下更简单的方法：

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

有？

当前回答

字符串标点符号漏掉了现实世界中常用的大量标点符号。一个适用于非ASCII标点符号的解决方案怎么样？

import regex
s = u"string. With. Some・Really Weird、Non？ASCII。 「（Punctuation）」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

我个人认为，这是在Python中删除字符串标点符号的最佳方法，因为：

它删除所有Unicode标点符号它很容易修改，例如，如果您想删除标点符号，可以删除\｛s｝，但保留像$这样的符号。您可以非常具体地了解要保留的内容和要删除的内容，例如，Pd只会删除破折号。此正则表达式还规范了空白。它将制表符、回车符和其他奇怪的字符映射到漂亮的单个空格。

这使用了Unicode字符财产，您可以在Wikipedia上阅读更多有关该属性的信息。

2016-10-06 16:46:01

其他回答

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

2017-02-02 21:48:39

在不太严格的情况下，单行线可能会有所帮助：

''.join([c for c in s if c.isalnum() or c.isspace()])

2015-10-17 23:03:59

下面是Python 3.5的一行代码：

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

2016-03-21 02:46:47

这里有一个没有正则表达式的解决方案。

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then

用空格替换标点用单个空格替换单词之间的多个空格删除尾随空格（如果有）条带（）

2016-11-30 10:29:40

字符串标点符号仅为ASCII！一种更正确（但也慢得多）的方法是使用unicodedata模块：

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

您还可以概括和剥离其他类型的字符：

''.join(ch for ch in s if category(ch)[0] not in 'SP')

它还将删除~*+§$等字符，这些字符可能是“标点符号”，也可能不是“标点符号。

2011-09-01 09:29:45

从字符串中删除标点符号的最佳方法

推荐文章

最新文章

标签