如何从字符串中删除所有非字母的字符?
非字母数字呢?
这必须是一个自定义函数还是也有更通用的解决方案?
如何从字符串中删除所有非字母的字符?
非字母数字呢?
这必须是一个自定义函数还是也有更通用的解决方案?
当前回答
看过所有给出的解决方案后,我认为必须有一个纯SQL方法,它不需要函数或CTE / XML查询,并且不涉及难以维护的嵌套REPLACE语句。以下是我的解决方案:
SELECT
x
,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
-- Keep adding rows until you reach the column size
AS stripped_column
FROM (SELECT
column_to_strip AS x
,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a
FROM my_table) a
这样做的好处是,有效字符包含在子查询中的一个字符串中,便于为不同的字符集重新配置。
缺点是您必须为每个字符添加一行SQL,直到您的列的大小。为了让这个任务更容易,我只是使用了下面的Powershell脚本,这个例子如果是VARCHAR(64):
1..64 | % {
" + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
} | clip.exe
其他回答
这是另一个递归CTE解决方案,基于@Gerhard Weiss的回答。您应该能够将整个代码块复制并粘贴到SSMS中,并在那里使用它。结果包括一些额外的列,以帮助我们理解发生了什么。我花了一段时间才理解了PATINDEX (RegEx)和递归CTE的全部原理。
DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%' --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%' --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%' --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#' --Change this to whatever you want to replace non-printable chars with
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
BEGIN
RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
RETURN
END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'
--Let the magic begin:
;WITH recurse AS (
select
OriginalString,
OriginalString as CleanString,
patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
from @temp
UNION ALL
select
OriginalString,
CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
from recurse
where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0
我把它放在调用PatIndex的两个地方。
PatIndex('%[^A-Za-z0-9]%', @Temp)
为上面的自定义函数RemoveNonAlphaCharacters并重命名为RemoveNonAlphaNumericCharacters
我知道SQL不擅长字符串操作,但我没想到它会这么难。下面是一个简单的函数,用于从字符串中剥离所有数字。当然还有更好的办法,但这只是个开始。
CREATE FUNCTION dbo.AlphaOnly (
@String varchar(100)
)
RETURNS varchar(100)
AS BEGIN
RETURN (
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
@String,
'9', ''),
'8', ''),
'7', ''),
'6', ''),
'5', ''),
'4', ''),
'3', ''),
'2', ''),
'1', ''),
'0', '')
)
END
GO
-- ==================
DECLARE @t TABLE (
ColID int,
ColString varchar(50)
)
INSERT INTO @t VALUES (1, 'abc1234567890')
SELECT ColID, ColString, dbo.AlphaOnly(ColString)
FROM @t
输出
ColID ColString
----- ------------- ---
1 abc1234567890 abc
第2轮-数据驱动黑名单
-- ============================================
-- Create a table of blacklist characters
-- ============================================
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist'))
DROP TABLE dbo.CharacterBlacklist
GO
CREATE TABLE dbo.CharacterBlacklist (
CharID int IDENTITY,
DisallowedCharacter nchar(1) NOT NULL
)
GO
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9')
GO
-- ====================================
IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters'))
DROP FUNCTION dbo.StripBlacklistCharacters
GO
CREATE FUNCTION dbo.StripBlacklistCharacters (
@String nvarchar(100)
)
RETURNS varchar(100)
AS BEGIN
DECLARE @blacklistCt int
DECLARE @ct int
DECLARE @c nchar(1)
SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist
SET @ct = 0
WHILE @ct < @blacklistCt BEGIN
SET @ct = @ct + 1
SELECT @String = REPLACE(@String, DisallowedCharacter, N'')
FROM dbo.CharacterBlacklist
WHERE CharID = @ct
END
RETURN (@String)
END
GO
-- ====================================
DECLARE @s nvarchar(24)
SET @s = N'abc1234def5678ghi90jkl'
SELECT
@s AS OriginalString,
dbo.StripBlacklistCharacters(@s) AS ResultString
输出
OriginalString ResultString
------------------------ ------------
abc1234def5678ghi90jkl abcdefghijkl
我对读者的挑战是:你能让这个过程更有效率吗?那么使用递归呢?
下面是使用iTVF删除非字母字符的另一种方法。首先,需要一个基于模式的字符串分配器。以下是Dwain Camp文章中的一段:
-- PatternSplitCM will split a string based on a pattern of the form
-- supported by LIKE and PATINDEX
--
-- Created by: Chris Morris 12-Oct-2012
CREATE FUNCTION [dbo].[PatternSplitCM]
(
@List VARCHAR(8000) = NULL
,@Pattern VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH numbers AS (
SELECT TOP(ISNULL(DATALENGTH(@List), 0))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
)
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
Item = SUBSTRING(@List,MIN(n),1+MAX(n)-MIN(n)),
[Matched]
FROM (
SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)
FROM numbers
CROSS APPLY (
SELECT [Matched] = CASE WHEN SUBSTRING(@List,n,1) LIKE @Pattern THEN 1 ELSE 0 END
) y
) d
GROUP BY [Matched], Grouper
现在你有了一个基于模式的拆分器,你需要拆分匹配模式的字符串:
[a-z]
然后将它们连接起来以得到想要的结果:
SELECT *
FROM tbl t
CROSS APPLY(
SELECT Item + ''
FROM dbo.PatternSplitCM(t.str, '[a-z]')
WHERE Matched = 1
ORDER BY ItemNumber
FOR XML PATH('')
) x (a)
样本
结果:
| Id | str | a |
|----|------------------|----------------|
| 1 | test“te d'abc | testtedabc |
| 2 | anr¤a | anra |
| 3 | gs-re-C“te d'ab | gsreCtedab |
| 4 | M‚fe, DF | MfeDF |
| 5 | R™temd | Rtemd |
| 6 | ™jad”ji | jadji |
| 7 | Cje y ret¢n | Cjeyretn |
| 8 | J™kl™balu | Jklbalu |
| 9 | le“ne-iokd | leneiokd |
| 10 | liode-Pyr‚n‚ie | liodePyrnie |
| 11 | V„s G”ta | VsGta |
| 12 | Sƒo Paulo | SoPaulo |
| 13 | vAstra gAtaland | vAstragAtaland |
| 14 | ¥uble / Bio-Bio | ubleBioBio |
| 15 | U“pl™n/ds VAsb-y | UplndsVAsby |
Here's a solution that doesn't require creating a function or listing all instances of characters to replace. It uses a recursive WITH statement in combination with a PATINDEX to find unwanted chars. It will replace all unwanted chars in a column - up to 100 unique bad characters contained in any given string. (E.G. "ABC123DEF234" would contain 4 bad characters 1, 2, 3 and 4) The 100 limit is the maximum number of recursions allowed in a WITH statement, but this doesn't impose a limit on the number of rows to process, which is only limited by the memory available. If you don't want DISTINCT results, you can remove the two options from the code.
-- Create some test data:
SELECT * INTO #testData
FROM (VALUES ('ABC DEF,K.l(p)'),('123H,J,234'),('ABCD EFG')) as t(TXT)
-- Actual query:
-- Remove non-alpha chars: '%[^A-Z]%'
-- Remove non-alphanumeric chars: '%[^A-Z0-9]%'
DECLARE @BadCharacterPattern VARCHAR(250) = '%[^A-Z]%';
WITH recurMain as (
SELECT DISTINCT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
FROM #testData
UNION ALL
SELECT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
FROM (
SELECT
CASE WHEN BadCharIndex > 0
THEN REPLACE(TXT, SUBSTRING(TXT, BadCharIndex, 1), '')
ELSE TXT
END AS TXT
FROM recurMain
WHERE BadCharIndex > 0
) badCharFinder
)
SELECT DISTINCT TXT
FROM recurMain
WHERE BadCharIndex = 0;