如何从字符串中删除所有非字母的字符?
非字母数字呢?
这必须是一个自定义函数还是也有更通用的解决方案?
如何从字符串中删除所有非字母的字符?
非字母数字呢?
这必须是一个自定义函数还是也有更通用的解决方案?
当前回答
我知道SQL不擅长字符串操作,但我没想到它会这么难。下面是一个简单的函数,用于从字符串中剥离所有数字。当然还有更好的办法,但这只是个开始。
CREATE FUNCTION dbo.AlphaOnly (
@String varchar(100)
)
RETURNS varchar(100)
AS BEGIN
RETURN (
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
@String,
'9', ''),
'8', ''),
'7', ''),
'6', ''),
'5', ''),
'4', ''),
'3', ''),
'2', ''),
'1', ''),
'0', '')
)
END
GO
-- ==================
DECLARE @t TABLE (
ColID int,
ColString varchar(50)
)
INSERT INTO @t VALUES (1, 'abc1234567890')
SELECT ColID, ColString, dbo.AlphaOnly(ColString)
FROM @t
输出
ColID ColString
----- ------------- ---
1 abc1234567890 abc
第2轮-数据驱动黑名单
-- ============================================
-- Create a table of blacklist characters
-- ============================================
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist'))
DROP TABLE dbo.CharacterBlacklist
GO
CREATE TABLE dbo.CharacterBlacklist (
CharID int IDENTITY,
DisallowedCharacter nchar(1) NOT NULL
)
GO
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9')
GO
-- ====================================
IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters'))
DROP FUNCTION dbo.StripBlacklistCharacters
GO
CREATE FUNCTION dbo.StripBlacklistCharacters (
@String nvarchar(100)
)
RETURNS varchar(100)
AS BEGIN
DECLARE @blacklistCt int
DECLARE @ct int
DECLARE @c nchar(1)
SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist
SET @ct = 0
WHILE @ct < @blacklistCt BEGIN
SET @ct = @ct + 1
SELECT @String = REPLACE(@String, DisallowedCharacter, N'')
FROM dbo.CharacterBlacklist
WHERE CharID = @ct
END
RETURN (@String)
END
GO
-- ====================================
DECLARE @s nvarchar(24)
SET @s = N'abc1234def5678ghi90jkl'
SELECT
@s AS OriginalString,
dbo.StripBlacklistCharacters(@s) AS ResultString
输出
OriginalString ResultString
------------------------ ------------
abc1234def5678ghi90jkl abcdefghijkl
我对读者的挑战是:你能让这个过程更有效率吗?那么使用递归呢?
其他回答
DECLARE @vchVAlue NVARCHAR(255) = 'SWP, Lettering Position 1: 4 Ω, 2: 8 Ω, 3: 16 Ω, 4: , 5: , 6: , Voltage Selector, Solder, 6, Step switch, : w/o fuseholder '
WHILE PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))) > 0
BEGIN
SELECT @vchVAlue = STUFF(@vchVAlue,PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))),1,' ')
END
SELECT @vchVAlue
看过所有给出的解决方案后,我认为必须有一个纯SQL方法,它不需要函数或CTE / XML查询,并且不涉及难以维护的嵌套REPLACE语句。以下是我的解决方案:
SELECT
x
,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
+ CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
-- Keep adding rows until you reach the column size
AS stripped_column
FROM (SELECT
column_to_strip AS x
,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a
FROM my_table) a
这样做的好处是,有效字符包含在子查询中的一个字符串中,便于为不同的字符集重新配置。
缺点是您必须为每个字符添加一行SQL,直到您的列的大小。为了让这个任务更容易,我只是使用了下面的Powershell脚本,这个例子如果是VARCHAR(64):
1..64 | % {
" + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
} | clip.exe
我把它放在调用PatIndex的两个地方。
PatIndex('%[^A-Za-z0-9]%', @Temp)
为上面的自定义函数RemoveNonAlphaCharacters并重命名为RemoveNonAlphaNumericCharacters
SQL Server >= 2017…
declare @text varchar(max)
-- create some sample text
select
@text=
'
Lorem @ipsum *&dolor-= sit?! amet, {consectetur } adipiscing\ elit. Vivamus commodo justo metus, sed facilisis ante
congue eget. Proin ac bibendum sem/.
'
-- the characters to be removed
declare @unwanted varchar(max)='''.,!?/<>"[]{}|`~@#$%^&*()-+=/\:;'+char(13)+char(10)
-- interim replaced with
declare @replace_with char(1)=' '
-- call the translate function that will change unwanted characters to spaces
-- in this sample
declare @translated varchar(max)
select @translated=TRANSLATE(@text,@unwanted,REPLICATE(@replace_with,len(@unwanted)))
-- In this case, I want to preserve one space
select string_agg(trim(value),' ')
from STRING_SPLIT(@translated,' ')
where trim(value)<>''
-- Result
'Lorem ipsum dolor sit amet consectetur adipiscing elit Vivamus commodo justo metus sed facilisis ante congue eget Proin ac bibendum sem'
虽然这篇文章有点老了,但我想说以下几点。 我有上述解决方案的问题是,它没有过滤出字符,如ç, ë, ï等。我调整了一个函数如下(我只使用80 varchar字符串来节省内存):
create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80))
RETURNS varchar(80)
AS
BEGIN
declare @return varchar(80) , @length int , @counter int , @cur_char char(1)
SET @return = ''
SET @length = 0
SET @counter = 1
SET @length = LEN(@InputString)
IF @length > 0
BEGIN WHILE @counter <= @length
BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122))
BEGIN SET @return = @return + @cur_char END
SET @counter = @counter + 1
END END
RETURN @return END