我想从字符串中删除所有特殊字符。允许输入A-Z(大写或小写)、数字(0-9)、下划线(_)或点符号(.)。
我有以下,它是有效的,但我怀疑(我知道!)它不是很有效:
public static string RemoveSpecialCharacters(string str)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
if ((str[i] >= '0' && str[i] <= '9')
|| (str[i] >= 'A' && str[i] <= 'z'
|| (str[i] == '.' || str[i] == '_')))
{
sb.Append(str[i]);
}
}
return sb.ToString();
}
最有效的方法是什么?正则表达式是什么样子的,它与普通字符串操作相比如何?
要清洗的字符串相当短,长度通常在10到30个字符之间。
I had to do something similar for work, but in my case I had to filter all that is not a letter, number or whitespace (but you could easily modify it to your needs).
The filtering is done client-side in JavaScript, but for security reasons I am also doing the filtering server-side. Since I can expect most of the strings to be clean, I would like to avoid copying the string unless I really need to. This let my to the implementation below, which should perform better for both clean and dirty strings.
public static string EnsureOnlyLetterDigitOrWhiteSpace(string input)
{
StringBuilder cleanedInput = null;
for (var i = 0; i < input.Length; ++i)
{
var currentChar = input[i];
var charIsValid = char.IsLetterOrDigit(currentChar) || char.IsWhiteSpace(currentChar);
if (charIsValid)
{
if(cleanedInput != null)
cleanedInput.Append(currentChar);
}
else
{
if (cleanedInput != null) continue;
cleanedInput = new StringBuilder();
if (i > 0)
cleanedInput.Append(input.Substring(0, i));
}
}
return cleanedInput == null ? input : cleanedInput.ToString();
}
如果你使用的是动态字符列表,LINQ可以提供一个更快更优雅的解决方案:
public static string RemoveSpecialCharacters(string value, char[] specialCharacters)
{
return new String(value.Except(specialCharacters).ToArray());
}
我将这种方法与之前的两种“快速”方法(发行版编译)进行了比较:
字符数组解决方案由LukeH - 427毫秒
StringBuilder解决方案- 429毫秒
LINQ(这个答案)- 98毫秒
注意,算法略有修改-字符作为数组传入,而不是硬编码,这可能会有轻微的影响(即/其他解决方案将有一个内部for循环来检查字符数组)。
如果我使用LINQ where子句切换到硬编码的解决方案,结果是:
字符数组解决方案- 7ms
StringBuilder解决方案- 22ms
LINQ - 60毫秒
如果您计划编写一个更通用的解决方案,而不是硬编码字符列表,那么可能值得考虑LINQ或经过修改的方法。LINQ绝对能给你简洁、高可读性的代码——甚至比Regex更好。
如果担心速度问题,可以使用指针编辑现有字符串。您可以固定字符串并获取指向它的指针,然后在每个字符上运行for循环,用替换字符覆盖每个无效字符。这将是非常有效的,并且不需要分配任何新的字符串内存。为了使用指针,你还需要用不安全选项来编译你的模块,并在方法头中添加“不安全”修饰符。
static void Main(string[] args)
{
string str = "string!$%with^&*invalid!!characters";
Console.WriteLine( str ); //print original string
FixMyString( str, ' ' );
Console.WriteLine( str ); //print string again to verify that it has been modified
Console.ReadLine(); //pause to leave command prompt open
}
public static unsafe void FixMyString( string str, char replacement_char )
{
fixed (char* p_str = str)
{
char* c = p_str; //temp pointer, since p_str is read-only
for (int i = 0; i < str.Length; i++, c++) //loop through each character in string, advancing the character pointer as well
if (!IsValidChar(*c)) //check whether the current character is invalid
(*c) = replacement_char; //overwrite character in existing string with replacement character
}
}
public static bool IsValidChar( char c )
{
return (c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c == '.' || c == '_');
//return char.IsLetterOrDigit( c ) || c == '.' || c == '_'; //this may work as well
}