如何从字符串中删除所有非字母的字符?
非字母数字呢?
这必须是一个自定义函数还是也有更通用的解决方案?
如何从字符串中删除所有非字母的字符?
非字母数字呢?
这必须是一个自定义函数还是也有更通用的解决方案?
当前回答
Here's a solution that doesn't require creating a function or listing all instances of characters to replace. It uses a recursive WITH statement in combination with a PATINDEX to find unwanted chars. It will replace all unwanted chars in a column - up to 100 unique bad characters contained in any given string. (E.G. "ABC123DEF234" would contain 4 bad characters 1, 2, 3 and 4) The 100 limit is the maximum number of recursions allowed in a WITH statement, but this doesn't impose a limit on the number of rows to process, which is only limited by the memory available. If you don't want DISTINCT results, you can remove the two options from the code.
-- Create some test data:
SELECT * INTO #testData
FROM (VALUES ('ABC DEF,K.l(p)'),('123H,J,234'),('ABCD EFG')) as t(TXT)
-- Actual query:
-- Remove non-alpha chars: '%[^A-Z]%'
-- Remove non-alphanumeric chars: '%[^A-Z0-9]%'
DECLARE @BadCharacterPattern VARCHAR(250) = '%[^A-Z]%';
WITH recurMain as (
SELECT DISTINCT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
FROM #testData
UNION ALL
SELECT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
FROM (
SELECT
CASE WHEN BadCharIndex > 0
THEN REPLACE(TXT, SUBSTRING(TXT, BadCharIndex, 1), '')
ELSE TXT
END AS TXT
FROM recurMain
WHERE BadCharIndex > 0
) badCharFinder
)
SELECT DISTINCT TXT
FROM recurMain
WHERE BadCharIndex = 0;
其他回答
这是另一个递归CTE解决方案,基于@Gerhard Weiss的回答。您应该能够将整个代码块复制并粘贴到SSMS中,并在那里使用它。结果包括一些额外的列,以帮助我们理解发生了什么。我花了一段时间才理解了PATINDEX (RegEx)和递归CTE的全部原理。
DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%' --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%' --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%' --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#' --Change this to whatever you want to replace non-printable chars with
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
BEGIN
RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
RETURN
END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'
--Let the magic begin:
;WITH recurse AS (
select
OriginalString,
OriginalString as CleanString,
patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
from @temp
UNION ALL
select
OriginalString,
CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
from recurse
where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0
我把它放在调用PatIndex的两个地方。
PatIndex('%[^A-Za-z0-9]%', @Temp)
为上面的自定义函数RemoveNonAlphaCharacters并重命名为RemoveNonAlphaNumericCharacters
这个解决方案受到Allen先生的解决方案的启发,需要一个整数的Numbers表(如果您想进行具有良好性能的严肃查询操作,您应该手头有这个表)。它不需要CTE。您可以更改NOT IN(…)表达式以排除特定字符,或将其更改为IN(…)只保留某些字符的OR LIKE表达式。
SELECT (
SELECT SUBSTRING([YourString], N, 1)
FROM dbo.Numbers
WHERE N > 0 AND N <= CONVERT(INT, LEN([YourString]))
AND SUBSTRING([YourString], N, 1) NOT IN ('(',')',',','.')
FOR XML PATH('')
) AS [YourStringTransformed]
FROM ...
CREATE FUNCTION remove_spc_char(@str VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @resp VARCHAR(MAX) = '';
DECLARE @str_val VARCHAR(MAX) = UPPER(@str);
DECLARE @i INTEGER= 1;
DECLARE @v_asc INTEGER;
WHILE @i <= (LEN(@str_val))
BEGIN
SET @v_asc = (ASCII(SUBSTRING(@str_val, @i, 1)))
BEGIN
IF @v_asc in (192,193,194,195,196,65)
begin
SET @v_asc = 65;
SET @resp = concat(@resp, CHAR(@v_asc));
end;
IF @v_asc in (200,201,202,203,233,69)
begin
SET @v_asc = 69;
SET @resp = concat(@resp, CHAR(@v_asc));
end;
IF @v_asc in (204,205,206,207,296,73)
begin
SET @v_asc = 73;
SET @resp = concat(@resp, CHAR(@v_asc));
end;
IF @v_asc in (210,211,212,213,214,79)
begin
SET @v_asc = 79;
SET @resp = concat(@resp, CHAR(@v_asc));
end;
IF @v_asc in (217,218,219,220,85)
begin
SET @v_asc = 85;
SET @resp = concat(@resp, CHAR(@v_asc));
end;
IF @v_asc in (199,231,67)
begin
SET @v_asc = 67;
SET @resp = concat(@resp, CHAR(@v_asc));
end;
IF @v_asc in (209,78)
begin
SET @v_asc = 78;
SET @resp = concat(@resp, CHAR(@v_asc));
end;
IF @v_asc in (924,181,358,216,222,330,272,208,198,42,37,38,34,36,35,
64,33,39,41,40,43,61,95,45,62,60,63,47,176,183,124,166,174,359,248,254,
180,170,186,126,312,331,273,172,178,179,163,162,123,91,93,125,92,167,240,
223,230,171,187,169,185,168)
begin
SET @resp = concat(@resp, '');
end;
ELSE
begin
if @v_asc not in (65,67,69,73,78,79,85)
begin
SET @resp = concat(@resp, CHAR(@v_asc));
end;
end;
END;
SET @i = @i + 1
END;
RETURN @resp;
END;
如果您像我一样,不能仅向生产数据添加函数,但仍然想执行这种过滤,那么这里有一个纯SQL解决方案,使用PIVOT表将过滤后的部分重新组合在一起。
注意:我硬编码表高达40个字符,如果你有更长的字符串要过滤,你将不得不添加更多。
SET CONCAT_NULL_YIELDS_NULL OFF;
with
ToBeScrubbed
as (
select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub
),
Scrubbed as (
select
P.Number as ValueOrder,
isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue,
t.id
from
ToBeScrubbed t
left join master..spt_values P
on P.number between 1 and len(t.ColumnToScrub)
and type ='P'
where
PatIndex('%[^a-z]%', substring(t.ColumnToScrub,P.number,1) ) = 0
)
SELECT
id,
[1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10]
+ [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20]
+ [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30]
+ [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData
FROM (
select
*
from
Scrubbed
)
src
PIVOT (
MAX(ScrubbedValue) FOR ValueOrder IN (
[1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
[11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
[21], [22], [23], [24], [25], [26], [27], [28], [29], [30],
[31], [32], [33], [34], [35], [36], [37], [38], [39], [40]
)
) pvt