如何从字符串中删除所有非字母的字符?

非字母数字呢?

这必须是一个自定义函数还是也有更通用的解决方案?


当前回答

下面是使用iTVF删除非字母字符的另一种方法。首先,需要一个基于模式的字符串分配器。以下是Dwain Camp文章中的一段:

-- PatternSplitCM will split a string based on a pattern of the form 
-- supported by LIKE and PATINDEX 
-- 
-- Created by: Chris Morris 12-Oct-2012 
CREATE FUNCTION [dbo].[PatternSplitCM]
(
       @List                VARCHAR(8000) = NULL
       ,@Pattern            VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING 
AS 

RETURN
    WITH numbers AS (
        SELECT TOP(ISNULL(DATALENGTH(@List), 0))
            n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
    )

    SELECT
        ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
        Item = SUBSTRING(@List,MIN(n),1+MAX(n)-MIN(n)),
        [Matched]
    FROM (
        SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)
        FROM numbers
        CROSS APPLY (
            SELECT [Matched] = CASE WHEN SUBSTRING(@List,n,1) LIKE @Pattern THEN 1 ELSE 0 END
        ) y
    ) d
    GROUP BY [Matched], Grouper

现在你有了一个基于模式的拆分器,你需要拆分匹配模式的字符串:

[a-z]

然后将它们连接起来以得到想要的结果:

SELECT *
FROM tbl t
CROSS APPLY(
    SELECT Item + ''
    FROM dbo.PatternSplitCM(t.str, '[a-z]')
    WHERE Matched = 1
    ORDER BY ItemNumber
    FOR XML PATH('')
) x (a)

样本

结果:

| Id |              str |              a |
|----|------------------|----------------|
|  1 |    test“te d'abc |     testtedabc |
|  2 |            anr¤a |           anra |
|  3 |  gs-re-C“te d'ab |     gsreCtedab |
|  4 |         M‚fe, DF |          MfeDF |
|  5 |           R™temd |          Rtemd |
|  6 |          ™jad”ji |          jadji |
|  7 |      Cje y ret¢n |       Cjeyretn |
|  8 |        J™kl™balu |        Jklbalu |
|  9 |       le“ne-iokd |       leneiokd |
| 10 |   liode-Pyr‚n‚ie |    liodePyrnie |
| 11 |         V„s G”ta |          VsGta |
| 12 |        Sƒo Paulo |        SoPaulo |
| 13 |  vAstra gAtaland | vAstragAtaland |
| 14 |  ¥uble / Bio-Bio |     ubleBioBio |
| 15 | U“pl™n/ds VAsb-y |    UplndsVAsby |

其他回答

我把它放在调用PatIndex的两个地方。

PatIndex('%[^A-Za-z0-9]%', @Temp)

为上面的自定义函数RemoveNonAlphaCharacters并重命名为RemoveNonAlphaNumericCharacters

从性能角度来看,我会使用内联函数:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[udf_RemoveNumericCharsFromString]
(
@List NVARCHAR(4000)
)
RETURNS TABLE 
AS RETURN

    WITH GetNums AS (
       SELECT TOP(ISNULL(DATALENGTH(@List), 0))
        n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
          (VALUES (0),(0),(0),(0)) d (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
            )

    SELECT StrOut = ''+
        (SELECT Chr
         FROM GetNums
            CROSS APPLY (SELECT SUBSTRING(@List , n,1)) X(Chr)
         WHERE Chr LIKE '%[^0-9]%' 
         ORDER BY N
         FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)')


   /*How to Use
   SELECT StrOut FROM dbo.udf_RemoveNumericCharsFromString ('vv45--9gut')
   Result: vv--gut
   */

我刚在Oracle 10g中找到了这个,如果你用的就是它的话。为了进行电话号码比较,我必须去掉所有的特殊字符。

regexp_replace(c.phone, '[^0-9]', '')

首先创建一个函数

CREATE FUNCTION [dbo].[GetNumericonly]
(@strAlphaNumeric VARCHAR(256))
RETURNS VARCHAR(256)
AS
BEGIN
     DECLARE @intAlpha INT
     SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric)
BEGIN
     WHILE @intAlpha > 0
   BEGIN
          SET @strAlphaNumeric = STUFF(@strAlphaNumeric, @intAlpha, 1, '' )
          SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric )
   END
END
RETURN ISNULL(@strAlphaNumeric,0)
END

现在把这个函数叫做

select [dbo].[GetNumericonly]('Abhi12shek23jaiswal')

它的结果是

1223

如果您像我一样,不能仅向生产数据添加函数,但仍然想执行这种过滤,那么这里有一个纯SQL解决方案,使用PIVOT表将过滤后的部分重新组合在一起。

注意:我硬编码表高达40个字符,如果你有更长的字符串要过滤,你将不得不添加更多。

SET CONCAT_NULL_YIELDS_NULL OFF;

with 
    ToBeScrubbed
as (
    select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub
),

Scrubbed as (
    select 
        P.Number as ValueOrder,
        isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue,
        t.id
    from
        ToBeScrubbed t
        left join master..spt_values P
            on P.number between 1 and len(t.ColumnToScrub)
            and type ='P'
    where
        PatIndex('%[^a-z]%', substring(t.ColumnToScrub,P.number,1) ) = 0
)

SELECT
    id, 
    [1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10]
    +  [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20]
    +  [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30]
    +  [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData
FROM (
    select 
        *
    from 
        Scrubbed
    ) 
    src
    PIVOT (
        MAX(ScrubbedValue) FOR ValueOrder IN (
        [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
        [11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
        [21], [22], [23], [24], [25], [26], [27], [28], [29], [30],
        [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]
        )
    ) pvt