如何从字符串中删除所有非字母的字符?

非字母数字呢?

这必须是一个自定义函数还是也有更通用的解决方案?


当前回答

我把它放在调用PatIndex的两个地方。

PatIndex('%[^A-Za-z0-9]%', @Temp)

为上面的自定义函数RemoveNonAlphaCharacters并重命名为RemoveNonAlphaNumericCharacters

其他回答

如果您像我一样,不能仅向生产数据添加函数,但仍然想执行这种过滤,那么这里有一个纯SQL解决方案,使用PIVOT表将过滤后的部分重新组合在一起。

注意:我硬编码表高达40个字符,如果你有更长的字符串要过滤,你将不得不添加更多。

SET CONCAT_NULL_YIELDS_NULL OFF;

with 
    ToBeScrubbed
as (
    select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub
),

Scrubbed as (
    select 
        P.Number as ValueOrder,
        isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue,
        t.id
    from
        ToBeScrubbed t
        left join master..spt_values P
            on P.number between 1 and len(t.ColumnToScrub)
            and type ='P'
    where
        PatIndex('%[^a-z]%', substring(t.ColumnToScrub,P.number,1) ) = 0
)

SELECT
    id, 
    [1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10]
    +  [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20]
    +  [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30]
    +  [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData
FROM (
    select 
        *
    from 
        Scrubbed
    ) 
    src
    PIVOT (
        MAX(ScrubbedValue) FOR ValueOrder IN (
        [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
        [11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
        [21], [22], [23], [24], [25], [26], [27], [28], [29], [30],
        [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]
        )
    ) pvt

试试这个函数:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

这样叫它:

Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')

一旦您理解了代码,您就会发现更改它以删除其他字符也相对简单。您甚至可以使此动态到足以传入您的搜索模式。

SQL Server 2017+的另一个可能的选项,没有循环和/或递归,是使用TRANSLATE()和REPLACE()的基于字符串的方法。

t - sql声明:

DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

SELECT 
   v.[Text], 
   REPLACE(
      TRANSLATE(
         v.[Text],
         REPLACE(TRANSLATE(v.[Text], @pattern, REPLICATE('a', LEN(@pattern))), 'a', ''),
         REPLICATE('0', LEN(REPLACE(TRANSLATE(v.[Text], @pattern, REPLICATE('a', LEN(@pattern))), 'a', '')))
      ),
      '0',
      ''
   ) AS AlphabeticCharacters
FROM (VALUES
   ('abc1234def5678ghi90jkl#@$&'),
   ('1234567890'),
   ('JAHDBESBN%*#*@*($E*sd55bn')
) v ([Text])

或作为一个函数:

CREATE FUNCTION dbo.RemoveNonAlphabeticCharacters (@Text varchar(1000)) 
RETURNS varchar(1000)
AS BEGIN

   DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
   SET @text = REPLACE(
      TRANSLATE(
         @Text,
         REPLACE(TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))), 'a', ''),
         REPLICATE('0', LEN(REPLACE(TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))), 'a', '')))
      ),
      '0',
      ''
   )
   
   RETURN @Text
END

虽然这篇文章有点老了,但我想说以下几点。 我有上述解决方案的问题是,它没有过滤出字符,如ç, ë, ï等。我调整了一个函数如下(我只使用80 varchar字符串来节省内存):

create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80)) 
RETURNS varchar(80) 
AS 

BEGIN 
declare @return varchar(80) , @length int , @counter int , @cur_char char(1) 
SET @return = '' 
SET @length = 0 
SET @counter = 1 
SET @length = LEN(@InputString) 
IF @length > 0 
BEGIN WHILE @counter <= @length 

BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122))
BEGIN SET @return = @return + @cur_char END 
SET @counter = @counter + 1 
END END 

RETURN @return END

这个解决方案受到Allen先生的解决方案的启发,需要一个整数的Numbers表(如果您想进行具有良好性能的严肃查询操作,您应该手头有这个表)。它不需要CTE。您可以更改NOT IN(…)表达式以排除特定字符,或将其更改为IN(…)只保留某些字符的OR LIKE表达式。

SELECT (
    SELECT  SUBSTRING([YourString], N, 1)
    FROM    dbo.Numbers
    WHERE   N > 0 AND N <= CONVERT(INT, LEN([YourString]))
        AND SUBSTRING([YourString], N, 1) NOT IN ('(',')',',','.')
    FOR XML PATH('')
) AS [YourStringTransformed]
FROM ...