如何从字符串中删除所有非字母的字符?

非字母数字呢?

这必须是一个自定义函数还是也有更通用的解决方案?


当前回答

虽然这篇文章有点老了,但我想说以下几点。 我有上述解决方案的问题是,它没有过滤出字符,如ç, ë, ï等。我调整了一个函数如下(我只使用80 varchar字符串来节省内存):

create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80)) 
RETURNS varchar(80) 
AS 

BEGIN 
declare @return varchar(80) , @length int , @counter int , @cur_char char(1) 
SET @return = '' 
SET @length = 0 
SET @counter = 1 
SET @length = LEN(@InputString) 
IF @length > 0 
BEGIN WHILE @counter <= @length 

BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122))
BEGIN SET @return = @return + @cur_char END 
SET @counter = @counter + 1 
END END 

RETURN @return END

其他回答

试试这个函数:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

这样叫它:

Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')

一旦您理解了代码,您就会发现更改它以删除其他字符也相对简单。您甚至可以使此动态到足以传入您的搜索模式。

我刚在Oracle 10g中找到了这个,如果你用的就是它的话。为了进行电话号码比较,我必须去掉所有的特殊字符。

regexp_replace(c.phone, '[^0-9]', '')

这是另一个递归CTE解决方案,基于@Gerhard Weiss的回答。您应该能够将整个代码块复制并粘贴到SSMS中,并在那里使用它。结果包括一些额外的列,以帮助我们理解发生了什么。我花了一段时间才理解了PATINDEX (RegEx)和递归CTE的全部原理。

DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
    BEGIN
        RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
        RETURN
    END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'

--Let the magic begin:
;WITH recurse AS (
    select
    OriginalString,
    OriginalString as CleanString,
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
    substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
    ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
    from @temp
   UNION ALL
    select
    OriginalString,
    CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
    substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
    ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
    from recurse
    where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0

这种方式没有为我工作,因为我试图保持阿拉伯字母,我试图取代正则表达式,但它也不起作用。我写了另一个方法工作在ASCII级别,因为这是我唯一的选择,它工作。

 Create function [dbo].[RemoveNonAlphaCharacters] (@s varchar(4000)) returns varchar(4000)
   with schemabinding
begin
   if @s is null
      return null
   declare @s2 varchar(4000)
   set @s2 = ''
   declare @l int
   set @l = len(@s)
   declare @p int
   set @p = 1
   while @p <= @l begin
      declare @c int
      set @c = ascii(substring(@s, @p, 1))
      if @c between 48 and 57 or @c between 65 and 90 or @c between 97 and 122 or @c between 165 and 253 or @c between 32 and 33
         set @s2 = @s2 + char(@c)
      set @p = @p + 1
      end
   if len(@s2) = 0
      return null
   return @s2
   end

GO

看过所有给出的解决方案后,我认为必须有一个纯SQL方法,它不需要函数或CTE / XML查询,并且不涉及难以维护的嵌套REPLACE语句。以下是我的解决方案:

SELECT 
  x
  ,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
-- Keep adding rows until you reach the column size 
    AS stripped_column
FROM (SELECT 
        column_to_strip AS x
        ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a 
      FROM my_table) a

这样做的好处是,有效字符包含在子查询中的一个字符串中,便于为不同的字符集重新配置。

缺点是您必须为每个字符添加一行SQL,直到您的列的大小。为了让这个任务更容易,我只是使用了下面的Powershell脚本,这个例子如果是VARCHAR(64):

1..64 | % {
  "    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
} | clip.exe