我想从字符串中删除所有特殊字符。允许输入A-Z(大写或小写)、数字(0-9)、下划线(_)或点符号(.)。

我有以下,它是有效的,但我怀疑(我知道!)它不是很有效:

    public static string RemoveSpecialCharacters(string str)
    {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < str.Length; i++)
        {
            if ((str[i] >= '0' && str[i] <= '9')
                || (str[i] >= 'A' && str[i] <= 'z'
                    || (str[i] == '.' || str[i] == '_')))
                {
                    sb.Append(str[i]);
                }
        }

        return sb.ToString();
    }

最有效的方法是什么?正则表达式是什么样子的,它与普通字符串操作相比如何?

要清洗的字符串相当短,长度通常在10到30个字符之间。


当前回答

Use:

s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());

bool my_predicate(char c)
{
 return !(isalpha(c) || c=='_' || c==' '); // depending on you definition of special characters
}

你会得到一个干净的字符串s。

Erase()将去掉所有特殊字符,并且可以使用my_predicate()函数进行高度自定义。

其他回答

如果担心速度问题,可以使用指针编辑现有字符串。您可以固定字符串并获取指向它的指针,然后在每个字符上运行for循环,用替换字符覆盖每个无效字符。这将是非常有效的,并且不需要分配任何新的字符串内存。为了使用指针,你还需要用不安全选项来编译你的模块,并在方法头中添加“不安全”修饰符。

static void Main(string[] args)
{
    string str = "string!$%with^&*invalid!!characters";
    Console.WriteLine( str ); //print original string
    FixMyString( str, ' ' );
    Console.WriteLine( str ); //print string again to verify that it has been modified
    Console.ReadLine(); //pause to leave command prompt open
}


public static unsafe void FixMyString( string str, char replacement_char )
{
    fixed (char* p_str = str)
    {
        char* c = p_str; //temp pointer, since p_str is read-only
        for (int i = 0; i < str.Length; i++, c++) //loop through each character in string, advancing the character pointer as well
            if (!IsValidChar(*c)) //check whether the current character is invalid
                (*c) = replacement_char; //overwrite character in existing string with replacement character
    }
}

public static bool IsValidChar( char c )
{
    return (c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c == '.' || c == '_');
    //return char.IsLetterOrDigit( c ) || c == '.' || c == '_'; //this may work as well
}

简单的LINQ方法

string text = "123a22 ";
var newText = String.Join(string.Empty, text.Where(x => x != 'a'));

我觉得不错。我要做的唯一改进是用字符串的长度初始化StringBuilder。

StringBuilder sb = new StringBuilder(str.Length);
public static string RemoveSpecialCharacters(string str)
{
    char[] buffer = new char[str.Length];
    int idx = 0;

    foreach (char c in str)
    {
        if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z')
            || (c >= 'a' && c <= 'z') || (c == '.') || (c == '_'))
        {
            buffer[idx] = c;
            idx++;
        }
    }

    return new string(buffer, 0, idx);
}

这里有很多建议的解决方案,有些比其他的更有效,但可能不是很好读。这里有一个可能不是最有效的,但在大多数情况下肯定是可用的,并且非常简洁易读,利用Linq:

string stringToclean = "This is a test.  Do not try this at home; you might get hurt. Don't believe it?";

var validPunctuation = new HashSet<char>(". -");

var cleanedVersion = new String(stringToclean.Where(x => (x >= 'A' && x <= 'Z') || (x >= 'a' && x <= 'z') || validPunctuation.Contains(x)).ToArray());

var cleanedLowercaseVersion = new String(stringToclean.ToLower().Where(x => (x >= 'a' && x <= 'z') || validPunctuation.Contains(x)).ToArray());