如何从字符串中删除所有非字母的字符?

非字母数字呢?

这必须是一个自定义函数还是也有更通用的解决方案?


当前回答

这是另一个递归CTE解决方案,基于@Gerhard Weiss的回答。您应该能够将整个代码块复制并粘贴到SSMS中,并在那里使用它。结果包括一些额外的列,以帮助我们理解发生了什么。我花了一段时间才理解了PATINDEX (RegEx)和递归CTE的全部原理。

DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
    BEGIN
        RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
        RETURN
    END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'

--Let the magic begin:
;WITH recurse AS (
    select
    OriginalString,
    OriginalString as CleanString,
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
    substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
    ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
    from @temp
   UNION ALL
    select
    OriginalString,
    CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
    substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
    ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
    from recurse
    where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0

其他回答

我刚在Oracle 10g中找到了这个,如果你用的就是它的话。为了进行电话号码比较,我必须去掉所有的特殊字符。

regexp_replace(c.phone, '[^0-9]', '')

这是另一个递归CTE解决方案,基于@Gerhard Weiss的回答。您应该能够将整个代码块复制并粘贴到SSMS中,并在那里使用它。结果包括一些额外的列,以帮助我们理解发生了什么。我花了一段时间才理解了PATINDEX (RegEx)和递归CTE的全部原理。

DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
    BEGIN
        RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
        RETURN
    END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'

--Let the magic begin:
;WITH recurse AS (
    select
    OriginalString,
    OriginalString as CleanString,
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
    substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
    ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
    from @temp
   UNION ALL
    select
    OriginalString,
    CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
    substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
    ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
    from recurse
    where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0

试试这个函数:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

这样叫它:

Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')

一旦您理解了代码,您就会发现更改它以删除其他字符也相对简单。您甚至可以使此动态到足以传入您的搜索模式。

DECLARE @vchVAlue NVARCHAR(255) = 'SWP, Lettering Position 1: 4 Ω, 2: 8 Ω, 3: 16 Ω, 4:  , 5:  , 6:  , Voltage Selector, Solder, 6, Step switch, : w/o fuseholder '


WHILE PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))) > 0
  BEGIN
    SELECT @vchVAlue = STUFF(@vchVAlue,PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))),1,' ')
  END 

SELECT @vchVAlue

看过所有给出的解决方案后,我认为必须有一个纯SQL方法,它不需要函数或CTE / XML查询,并且不涉及难以维护的嵌套REPLACE语句。以下是我的解决方案:

SELECT 
  x
  ,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
-- Keep adding rows until you reach the column size 
    AS stripped_column
FROM (SELECT 
        column_to_strip AS x
        ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a 
      FROM my_table) a

这样做的好处是,有效字符包含在子查询中的一个字符串中,便于为不同的字符集重新配置。

缺点是您必须为每个字符添加一行SQL,直到您的列的大小。为了让这个任务更容易,我只是使用了下面的Powershell脚本,这个例子如果是VARCHAR(64):

1..64 | % {
  "    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
} | clip.exe