如何从JavaScript中使用正则表达式的字符串中剥离所有标点符号?

如果我有一个字符串，其中有任何类型的非字母数字字符:

"This., -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation"

我如何在JavaScript中得到一个没有标点符号的版本:

"This is an example of a string with punctuation"

当前回答

如果您正在使用lodash

_.words('This, is : my - test,line:').join(' ')

这个例子

_.words('"This., -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation"').join(' ')

其他回答

这取决于你想要返回什么。我最近用了这个:

return text.match(/[a-z]/i);

如果您正在使用lodash

_.words('This, is : my - test,line:').join(' ')

这个例子

_.words('"This., -/ is #! an $ % ^ & * example ;: {} of a = -_ string with `~)() punctuation"').join(' ')

在支持Unicode的语言中，Unicode Punctuation字符属性是\p{p}——为了便于阅读，通常可以缩写为\pP，有时也可以扩展为\p{Punctuation}。

您正在使用Perl兼容正则表达式库吗?

如果你想只保留字母和空格，你可以这样做:

str.replace(/[^a-zA-Z ]+/g, '').replace('/ {2,}/',' ')

截至2021年，许多现代浏览器都支持JavaScript内置的:RegExp: Unicode属性转义。所以你现在可以简单地使用\p{p}:

str.replace(/[\p{P}$+<=>^`|~]/gu, '')

如果您想忽略所有符号(\p{S})和标点符号，则可以进一步简化正则表达式。

str.replace(str.replace(/[\p{P}\p{S}]/gu, '')

如果你想剥离除字母(\p{L})，数字(\p{N})和分隔符(\p{Z})之外的所有内容。你可以使用像这样的否定字符集(也适用于非英语字母数字字符):

str.replace(/[^\p{L}\p{N}\p{Z}]/gu, '')

上面的正则表达式可以工作，但更常见的用例是使用正则表达式的空白类而不是Unicode分隔符字符集，因为后者不包括制表符和换行符。试试这个:

str.replace(/[^\p{L}\p{N}\s]/gu, '')

你也可能喜欢用.replace(/ +/g， ' ')来串起连续的空格。

你可以随意摆弄这些东西!裁判: Unicode字符属性-维基百科 Unicode属性转义- MDN

推荐文章