我正在做一些事情,我意识到我想要在一个字符串中找到多少个/s,然后我突然想到,有几种方法可以做到这一点,但不能决定哪种是最好的(或最简单的)。

目前我想说的是:

string source = "/once/upon/a/time/";
int count = source.Length - source.Replace("/", "").Length;

但我一点都不喜欢,有人愿意吗?

我并不想为此挖掘出正则表达式,对吧?

我知道我的字符串将包含我要搜索的项,所以你可以假设…

当然对于长度为> 1的字符串,

string haystack = "/once/upon/a/time";
string needle = "/";
int needleCount = ( haystack.Length - haystack.Replace(needle,"").Length ) / needle.Length;

string source = "/once/upon/a/time/";
int count = 0;
foreach (char c in source) 
  if (c == '/') count++;

必须比source.Replace()本身更快。


LINQ适用于所有的集合,因为字符串只是字符的集合,那么下面这个漂亮的小语句怎么样:

var count = source.Count(c => c == '/');

确保你使用了system。linq;在代码文件的顶部,因为. count是来自该名称空间的扩展方法。


如果你使用的是。net 3.5,你可以用LINQ在一行代码中完成:

int count = source.Count(f => f == '/');

如果你不想使用LINQ,你可以用:

int count = source.Split('/').Length - 1;

您可能会惊讶地发现,您原来的技术似乎比这两种方法都快30% !我刚刚用“/once/upon/a/time/”做了一个快速的基准测试,结果如下:

你的原稿= 12s 源。计数= 19秒 源。分裂= 17秒 Foreach(来自bobwienholt的答案)= 10s

(迭代次数为50,000,000次,因此在现实世界中您不太可能注意到太多差异。)


这两个都只适用于单字符搜索词…

countOccurences("the", "the answer is the answer");

int countOccurences(string needle, string haystack)
{
    return (haystack.Length - haystack.Replace(needle,"").Length) / needle.Length;
}

也许更长的针头会更好…

但肯定有更优雅的方式。:)


如果你想搜索整个字符串,而不仅仅是字符:

src.Select((c, i) => src.Substring(i))
    .Count(sub => sub.StartsWith(target))

读为“对于字符串中的每个字符,将该字符开始的字符串的其余部分作为子字符串;如果它以目标字符串开头,就数一数。”


编辑:

source.Split('/').Length-1

int count = new Regex(Regex.Escape(needle)).Matches(haystack).Count;

string s = "65 fght 6565 4665 hjk";
int count = 0;
foreach (Match m in Regex.Matches(s, "65"))
  count++;

private int CountWords(string text, string word) {
    int count = (text.Length - text.Replace(word, "").Length) / word.Length;
    return count;
}

因为最初的解决方案,是最快的字符,我想它也将是字符串。这是我的贡献。

上下文:我在日志文件中寻找像“失败”和“成功”这样的词。

克 我


string source = "/once/upon/a/time/";
int count = 0;
int n = 0;

while ((n = source.IndexOf('/', n)) != -1)
{
   n++;
   count++;
}

在我的电脑上,这比5000万次迭代的每个字符解决方案快2秒左右。

2013年修订:

将字符串更改为char[]并遍历该字符串。将5000万次迭代的总时间进一步缩短一到两秒!

char[] testchars = source.ToCharArray();
foreach (char c in testchars)
{
     if (c == '/')
         count++;
}

这个更快:

char[] testchars = source.ToCharArray();
int length = testchars.Length;
for (int n = 0; n < length; n++)
{
    if (testchars[n] == '/')
        count++;
}

为了更好地衡量,从数组的末尾迭代到0似乎是最快的,大约5%。

int length = testchars.Length;
for (int n = length-1; n >= 0; n--)
{
    if (testchars[n] == '/')
        count++;
}

我想知道为什么这可能是谷歌周围(我记得一些关于反向迭代更快),并遇到了这个SO问题,烦人地使用字符串char[]技术。不过,我认为在这种情况下,反转技巧是新的。

在c#中迭代字符串中单个字符的最快方法是什么?


str="aaabbbbjjja";
int count = 0;
int size = str.Length;

string[] strarray = new string[size];
for (int i = 0; i < str.Length; i++)
{
    strarray[i] = str.Substring(i, 1);
}
Array.Sort(strarray);
str = "";
for (int i = 0; i < strarray.Length - 1; i++)
{

    if (strarray[i] == strarray[i + 1])
    {

        count++;
    }
    else
    {
        count++;
        str = str + strarray[i] + count;
        count = 0;
    }

}
count++;
str = str + strarray[strarray.Length - 1] + count;

这是用来计算字符出现次数的。本例的输出为"a4b4j3"


字符串出现的泛型函数:

public int getNumberOfOccurencies(String inputString, String checkString)
{
    if (checkString.Length > inputString.Length || checkString.Equals("")) { return 0; }
    int lengthDifference = inputString.Length - checkString.Length;
    int occurencies = 0;
    for (int i = 0; i < lengthDifference; i++) {
        if (inputString.Substring(i, checkString.Length).Equals(checkString)) { occurencies++; i += checkString.Length - 1; } }
    return occurencies;
}

string Name = "Very good nice one is very good but is very good nice one this is called the term";
bool valid=true;
int count = 0;
int k=0;
int m = 0;
while (valid)
{
    k = Name.Substring(m,Name.Length-m).IndexOf("good");
    if (k != -1)
    {
        count++;
        m = m + k + 4;
    }
    else
        valid = false;
}
Console.WriteLine(count + " Times accures");

我做了一些研究,发现理查德·沃森的解决方案在大多数情况下是最快的。这是文章中每个解决方案的结果表(除了那些使用Regex的,因为它在解析“test{test”这样的字符串时抛出异常)

    Name      | Short/char |  Long/char | Short/short| Long/short |  Long/long |
    Inspite   |         134|        1853|          95|        1146|         671|
    LukeH_1   |         346|        4490|         N/A|         N/A|         N/A|
    LukeH_2   |         152|        1569|         197|        2425|        2171|
Bobwienholt   |         230|        3269|         N/A|         N/A|         N/A|
Richard Watson|          33|         298|         146|         737|         543|
StefanosKargas|         N/A|         N/A|         681|       11884|       12486|

可以看到,在短字符串(10-50个字符)中查找短子字符串(1-5个字符)的出现次数时,首选原算法。

同样,对于多字符子字符串,您应该使用以下代码(基于Richard Watson的解决方案)

int count = 0, n = 0;

if(substring != "")
{
    while ((n = source.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1)
    {
        n += substring.Length;
        ++count;
    }
}

string source = "/once/upon/a/time/";
int count = 0, n = 0;
while ((n = source.IndexOf('/', n) + 1) != 0) count++;

这是Richard Watson的答案的一个变体,char在字符串中出现的次数越多,效率就会提高一点,代码也会更少!

虽然我必须说,在没有广泛测试每个场景的情况下,我确实看到了使用以下方法的显著速度提升:

int count = 0;
for (int n = 0; n < source.Length; n++) if (source[n] == '/') count++;

public static int GetNumSubstringOccurrences(string text, string search)
{
    int num = 0;
    int pos = 0;

    if (!string.IsNullOrEmpty(text) && !string.IsNullOrEmpty(search))
    {
        while ((pos = text.IndexOf(search, pos)) > -1)
        {
            num ++;
            pos += search.Length;
        }
    }
    return num;
}

            var conditionalStatement = conditionSetting.Value;

            //order of replace matters, remove == before =, incase of ===
            conditionalStatement = conditionalStatement.Replace("==", "~").Replace("!=", "~").Replace('=', '~').Replace('!', '~').Replace('>', '~').Replace('<', '~').Replace(">=", "~").Replace("<=", "~");

            var listOfValidConditions = new List<string>() { "!=", "==", ">", "<", ">=", "<=" };

            if (conditionalStatement.Count(x => x == '~') != 1)
            {
                result.InvalidFieldList.Add(new KeyFieldData(batch.DECurrentField, "The IsDoubleKeyCondition does not contain a supported conditional statement. Contact System Administrator."));
                result.Status = ValidatorStatus.Fail;
                return result;
            }

需要做一些类似于从字符串测试条件语句的事情。

用单个字符替换我正在寻找的内容,并计算单个字符的实例数。

显然,在发生这种情况之前,您需要检查您正在使用的单个字符是否存在于字符串中,以避免错误计数。


我认为最简单的方法是使用正则表达式。通过这种方式,你可以获得与使用myVar.Split('x')相同的分割计数,但在多个字符设置中。

string myVar = "do this to count the number of words in my wording so that I can word it up!";
int count = Regex.Split(myVar, "word").Length;

Regex.Matches(input,  Regex.Escape("stringToMatch")).Count

对于任何想要使用String扩展方法的人,

以下是我使用的基于张贴的最好的答案:

public static class StringExtension
{    
    /// <summary> Returns the number of occurences of a string within a string, optional comparison allows case and culture control. </summary>
    public static int Occurrences(this System.String input, string value, StringComparison stringComparisonType = StringComparison.Ordinal)
    {
        if (String.IsNullOrEmpty(value)) return 0;

        int count    = 0;
        int position = 0;

        while ((position = input.IndexOf(value, position, stringComparisonType)) != -1)
        {
            position += value.Length;
            count    += 1;
        }

        return count;
    }

    /// <summary> Returns the number of occurences of a single character within a string. </summary>
    public static int Occurrences(this System.String input, char value)
    {
        int count = 0;
        foreach (char c in input) if (c == value) count += 1;
        return count;
    }
}

string s = "HOWLYH THIS ACTUALLY WORKSH WOWH";
int count = 0;
for (int i = 0; i < s.Length; i++)
   if (s[i] == 'H') count++;

它只是检查字符串中的每个字符,如果该字符是您正在搜索的字符,则添加一个来计数。


如果你看看这个网页,有15种不同的方法进行了基准测试,包括使用并行循环。

最快的方法似乎是使用单线程for循环(如果您的。net版本< 4.0)或并行。for循环(如果使用。net > 4.0进行数千次检查)。

假设“ss”是你的搜索字符串,“ch”是你的字符数组(如果你有一个以上的字符你正在寻找),下面是代码的基本要点,有最快的运行时间单线程:

for (int x = 0; x < ss.Length; x++)
{
    for (int y = 0; y < ch.Length; y++)
    {
        for (int a = 0; a < ss[x].Length; a++ )
        {
        if (ss[x][a] == ch[y])
            //it's found. DO what you need to here.
        }
    }
}

还提供了基准测试源代码,以便您可以运行自己的测试。


字符串中的字符串:

在“..”中找到“etc”。JD JD JD JD等等。JDJDJDJDJDJDJDJD等等。”

var strOrigin = " .. JD JD JD JD etc. and etc. JDJDJDJDJDJDJDJD and etc.";
var searchStr = "etc";
int count = (strOrigin.Length - strOrigin.Replace(searchStr, "").Length)/searchStr.Length.

在丢弃这个不健全/笨拙之前检查性能…


我想我会把我的扩展方法扔到戒指(更多信息见评论)。我没有做过任何正式的基准测试,但我认为在大多数情况下必须非常快。

EDIT: OK - so this SO question got me to wondering how the performance of our current implementation would stack up against some of the solutions presented here. I decided to do a little bench marking and found that our solution was very much in line with the performance of the solution provided by Richard Watson up until you are doing aggressive searching with large strings (100 Kb +), large substrings (32 Kb +) and many embedded repetitions (10K +). At that point our solution was around 2X to 4X slower. Given this and the fact that we really like the solution presented by Richard Watson, we have refactored our solution accordingly. I just wanted to make this available for anyone that might benefit from it.

我们最初的解决方案:

    /// <summary>
    /// Counts the number of occurrences of the specified substring within
    /// the current string.
    /// </summary>
    /// <param name="s">The current string.</param>
    /// <param name="substring">The substring we are searching for.</param>
    /// <param name="aggressiveSearch">Indicates whether or not the algorithm 
    /// should be aggressive in its search behavior (see Remarks). Default 
    /// behavior is non-aggressive.</param>
    /// <remarks>This algorithm has two search modes - aggressive and 
    /// non-aggressive. When in aggressive search mode (aggressiveSearch = 
    /// true), the algorithm will try to match at every possible starting 
    /// character index within the string. When false, all subsequent 
    /// character indexes within a substring match will not be evaluated. 
    /// For example, if the string was 'abbbc' and we were searching for 
    /// the substring 'bb', then aggressive search would find 2 matches 
    /// with starting indexes of 1 and 2. Non aggressive search would find 
    /// just 1 match with starting index at 1. After the match was made, 
    /// the non aggressive search would attempt to make it's next match 
    /// starting at index 3 instead of 2.</remarks>
    /// <returns>The count of occurrences of the substring within the string.</returns>
    public static int CountOccurrences(this string s, string substring, 
        bool aggressiveSearch = false)
    {
        // if s or substring is null or empty, substring cannot be found in s
        if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring))
            return 0;

        // if the length of substring is greater than the length of s,
        // substring cannot be found in s
        if (substring.Length > s.Length)
            return 0;

        var sChars = s.ToCharArray();
        var substringChars = substring.ToCharArray();
        var count = 0;
        var sCharsIndex = 0;

        // substring cannot start in s beyond following index
        var lastStartIndex = sChars.Length - substringChars.Length;

        while (sCharsIndex <= lastStartIndex)
        {
            if (sChars[sCharsIndex] == substringChars[0])
            {
                // potential match checking
                var match = true;
                var offset = 1;
                while (offset < substringChars.Length)
                {
                    if (sChars[sCharsIndex + offset] != substringChars[offset])
                    {
                        match = false;
                        break;
                    }
                    offset++;
                }
                if (match)
                {
                    count++;
                    // if aggressive, just advance to next char in s, otherwise, 
                    // skip past the match just found in s
                    sCharsIndex += aggressiveSearch ? 1 : substringChars.Length;
                }
                else
                {
                    // no match found, just move to next char in s
                    sCharsIndex++;
                }
            }
            else
            {
                // no match at current index, move along
                sCharsIndex++;
            }
        }

        return count;
    }

这是我们修改后的解决方案:

    /// <summary>
    /// Counts the number of occurrences of the specified substring within
    /// the current string.
    /// </summary>
    /// <param name="s">The current string.</param>
    /// <param name="substring">The substring we are searching for.</param>
    /// <param name="aggressiveSearch">Indicates whether or not the algorithm 
    /// should be aggressive in its search behavior (see Remarks). Default 
    /// behavior is non-aggressive.</param>
    /// <remarks>This algorithm has two search modes - aggressive and 
    /// non-aggressive. When in aggressive search mode (aggressiveSearch = 
    /// true), the algorithm will try to match at every possible starting 
    /// character index within the string. When false, all subsequent 
    /// character indexes within a substring match will not be evaluated. 
    /// For example, if the string was 'abbbc' and we were searching for 
    /// the substring 'bb', then aggressive search would find 2 matches 
    /// with starting indexes of 1 and 2. Non aggressive search would find 
    /// just 1 match with starting index at 1. After the match was made, 
    /// the non aggressive search would attempt to make it's next match 
    /// starting at index 3 instead of 2.</remarks>
    /// <returns>The count of occurrences of the substring within the string.</returns>
    public static int CountOccurrences(this string s, string substring, 
        bool aggressiveSearch = false)
    {
        // if s or substring is null or empty, substring cannot be found in s
        if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring))
            return 0;

        // if the length of substring is greater than the length of s,
        // substring cannot be found in s
        if (substring.Length > s.Length)
            return 0;

        int count = 0, n = 0;
        while ((n = s.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1)
        {
            if (aggressiveSearch)
                n++;
            else
                n += substring.Length;
            count++;
        }

        return count;
    }

string search = "/string";
var occurrences = (regex.Match(search, @"\/")).Count;

每次程序找到“/s”(区分大小写)时,都会进行计数 出现的次数将存储在变量“occurrences”中。


我最初的想法是这样的:

public static int CountOccurrences(string original, string substring)
{
    if (string.IsNullOrEmpty(substring))
        return 0;
    if (substring.Length == 1)
        return CountOccurrences(original, substring[0]);
    if (string.IsNullOrEmpty(original) ||
        substring.Length > original.Length)
        return 0;
    int substringCount = 0;
    for (int charIndex = 0; charIndex < original.Length; charIndex++)
    {
        for (int subCharIndex = 0, secondaryCharIndex = charIndex; subCharIndex < substring.Length && secondaryCharIndex < original.Length; subCharIndex++, secondaryCharIndex++)
        {
            if (substring[subCharIndex] != original[secondaryCharIndex])
                goto continueOuter;
        }
        if (charIndex + substring.Length > original.Length)
            break;
        charIndex += substring.Length - 1;
        substringCount++;
    continueOuter:
        ;
    }
    return substringCount;
}

public static int CountOccurrences(string original, char @char)
{
    if (string.IsNullOrEmpty(original))
        return 0;
    int substringCount = 0;
    for (int charIndex = 0; charIndex < original.Length; charIndex++)
        if (@char == original[charIndex])
            substringCount++;
    return substringCount;
}

使用替换和除法的大海捞针方法产生21秒以上,而这需要大约15.2秒。

在添加位后进行编辑,这将添加子字符串。长度- 1到charIndex(就像它应该的那样),它在11.6秒。

编辑2:我使用了一个有26个双字符字符串的字符串,这里是更新到相同示例文本的时间:

大海捞针(OP版本):7.8秒

建议的机制:4.6秒。

编辑3:添加单个字符的大小写,它变成了1.2秒。

编辑4:作为上下文:使用了5000万次迭代。


在c#中,一个很好的字符串子字符串计数器是这样的:

public static int CCount(String haystack, String needle)
{
    return haystack.Split(new[] { needle }, StringSplitOptions.None).Length - 1;
}

对于字符串分隔符的情况(而不是字符情况,如主题所述): 字符串源= "@@ once@@@upon@@ a@@@time@@@"; Int count = source。Split(new[] {"@@@"}, StringSplitOptions.RemoveEmptyEntries)。长度- 1; 海报的原始源值("/once/upon/a/time/")自然分隔符是一个字符'/',并且响应确实解释了source. split (char[])选项…


我觉得我们缺少某些类型的子字符串计数,比如不安全的逐字节比较。我把原始海报的方法和我能想到的任何方法结合在一起。

这些是我做的字符串扩展。

namespace Example
{
    using System;
    using System.Text;

    public static class StringExtensions
    {
        public static int CountSubstr(this string str, string substr)
        {
            return (str.Length - str.Replace(substr, "").Length) / substr.Length;
        }

        public static int CountSubstr(this string str, char substr)
        {
            return (str.Length - str.Replace(substr.ToString(), "").Length);
        }

        public static int CountSubstr2(this string str, string substr)
        {
            int substrlen = substr.Length;
            int lastIndex = str.IndexOf(substr, 0, StringComparison.Ordinal);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + substrlen, StringComparison.Ordinal);
            }

            return count;
        }

        public static int CountSubstr2(this string str, char substr)
        {
            int lastIndex = str.IndexOf(substr, 0);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + 1);
            }

            return count;
        }

        public static int CountChar(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            for (int i = 0; i < length; ++i)
                if (str[i] == substr)
                    ++count;

            return count;
        }

        public static int CountChar2(this string str, char substr)
        {
            int count = 0;
            foreach (var c in str)
                if (c == substr)
                    ++count;

            return count;
        }

        public static unsafe int CountChar3(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = 0; i < length; ++i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountChar4(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = length - 1; i >= 0; --i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountSubstr3(this string str, string substr)
        {
            int length = str.Length;
            int substrlen = substr.Length;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = 0;

                    for (int i = 0; i < length; ++i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            ++n;
                            if (n == substrlen)
                            {
                                ++count;
                                n = 0;
                            }
                        }
                        else
                            n = 0;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr3(this string str, char substr)
        {
            return CountSubstr3(str, substr.ToString());
        }

        public static unsafe int CountSubstr4(this string str, string substr)
        {
            int length = str.Length;
            int substrLastIndex = substr.Length - 1;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = substrLastIndex;

                    for (int i = length - 1; i >= 0; --i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            if (--n == -1)
                            {
                                ++count;
                                n = substrLastIndex;
                            }
                        }
                        else
                            n = substrLastIndex;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr4(this string str, char substr)
        {
            return CountSubstr4(str, substr.ToString());
        }
    }
}

接下来是测试代码…

static void Main()
{
    const char matchA = '_';
    const string matchB = "and";
    const string matchC = "muchlongerword";
    const string testStrA = "_and_d_e_banna_i_o___pfasd__and_d_e_banna_i_o___pfasd_";
    const string testStrB = "and sdf and ans andeians andano ip and and sdf and ans andeians andano ip and";
    const string testStrC =
        "muchlongerword amuchlongerworsdfmuchlongerwordsdf jmuchlongerworijv muchlongerword sdmuchlongerword dsmuchlongerword";
    const int testSize = 1000000;
    Console.WriteLine(testStrA.CountSubstr('_'));
    Console.WriteLine(testStrA.CountSubstr2('_'));
    Console.WriteLine(testStrA.CountSubstr3('_'));
    Console.WriteLine(testStrA.CountSubstr4('_'));
    Console.WriteLine(testStrA.CountChar('_'));
    Console.WriteLine(testStrA.CountChar2('_'));
    Console.WriteLine(testStrA.CountChar3('_'));
    Console.WriteLine(testStrA.CountChar4('_'));
    Console.WriteLine(testStrB.CountSubstr("and"));
    Console.WriteLine(testStrB.CountSubstr2("and"));
    Console.WriteLine(testStrB.CountSubstr3("and"));
    Console.WriteLine(testStrB.CountSubstr4("and"));
    Console.WriteLine(testStrC.CountSubstr("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr2("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr3("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr4("muchlongerword"));
    var timer = new Stopwatch();
    timer.Start();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr(matchA);
    timer.Stop();
    Console.WriteLine("CS1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr(matchB);
    timer.Stop();
    Console.WriteLine("CS1 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr(matchC);
    timer.Stop();
    Console.WriteLine("CS1 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr2(matchA);
    timer.Stop();
    Console.WriteLine("CS2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr2(matchB);
    timer.Stop();
    Console.WriteLine("CS2 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr2(matchC);
    timer.Stop();
    Console.WriteLine("CS2 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr3(matchA);
    timer.Stop();
    Console.WriteLine("CS3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr3(matchB);
    timer.Stop();
    Console.WriteLine("CS3 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr3(matchC);
    timer.Stop();
    Console.WriteLine("CS3 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr4(matchA);
    timer.Stop();
    Console.WriteLine("CS4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr4(matchB);
    timer.Stop();
    Console.WriteLine("CS4 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr4(matchC);
    timer.Stop();
    Console.WriteLine("CS4 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar(matchA);
    timer.Stop();
    Console.WriteLine("CC1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar2(matchA);
    timer.Stop();
    Console.WriteLine("CC2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar3(matchA);
    timer.Stop();
    Console.WriteLine("CC3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar4(matchA);
    timer.Stop();
    Console.WriteLine("CC4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
}

结果:CSX与CountSubstrX对应,CCX与CountCharX对应。“chr”搜索字符串中的“_”,“and”搜索字符串中的“and”,“mlw”搜索字符串中的“muchlongerword”

CS1 chr: 824.123ms
CS1 and: 586.1893ms
CS1 mlw: 486.5414ms
CS2 chr: 127.8941ms
CS2 and: 806.3918ms
CS2 mlw: 497.318ms
CS3 chr: 201.8896ms
CS3 and: 124.0675ms
CS3 mlw: 212.8341ms
CS4 chr: 81.5183ms
CS4 and: 92.0615ms
CS4 mlw: 116.2197ms
CC1 chr: 66.4078ms
CC2 chr: 64.0161ms
CC3 chr: 65.9013ms
CC4 chr: 65.8206ms

最后,我有了一个包含360万个字符的文件。“derp adfderdserp dfaerpderp deasderp”重复了10万次。我用上述方法在文件中搜索“derp”100次,得到这些结果。

CS1Derp: 1501.3444ms
CS2Derp: 1585.797ms
CS3Derp: 376.0937ms
CS4Derp: 271.1663ms

所以我的第四种方法肯定是赢家,但是,实际上,如果一个360万个字符的文件100次只需要1586ms作为最坏的情况,那么所有这些都是可以忽略不计的。

顺便说一下,我还用100次CountSubstr和CountChar方法在360万个字符的文件中扫描了'd'字符。结果……

CS1  d : 2606.9513ms
CS2  d : 339.7942ms
CS3  d : 960.281ms
CS4  d : 233.3442ms
CC1  d : 302.4122ms
CC2  d : 280.7719ms
CC3  d : 299.1125ms
CC4  d : 292.9365ms

原来海报的方法是非常糟糕的单个字符针在一个大草堆根据这一点。

注:所有值更新为发布版本输出。在我第一次发布这篇文章时,我不小心忘记了建立发布模式。我的一些声明已经被修改了。


**计数字符或字符串**

 string st = "asdfasdfasdfsadfasdf/asdfasdfas/dfsdfsdafsdfsd/fsadfasdf/dff";
        int count = 0;
        int location = 0;
       
        while (st.IndexOf("/", location + 1) > 0)
        {
                count++;
                location = st.IndexOf("/", location + 1);
        }
        MessageBox.Show(count.ToString());

从。Net 5 (Net core 2.1+和NetStandard 2.1)开始,我们有了一个新的迭代速度之王。

“跨度<T>”https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=net-5.0

String有一个内置成员,返回Span<Char>

int count = 0;
foreach( var c in source.AsSpan())
{
    if (c == '/')
        count++;
}

我的测试显示比直刺快62%我还比较了Span<T>[I]上的for()循环,以及这里发布的其他一些内容。注意,String上的反向for()迭代现在似乎比直接foreach运行得慢。

Starting test, 10000000 iterations
(base) foreach =   673 ms

fastest to slowest
foreach Span =   252 ms   62.6%
  Span [i--] =   282 ms   58.1%
  Span [i++] =   402 ms   40.3%
   for [i++] =   454 ms   32.5%
   for [i--] =   867 ms  -28.8%
     Replace =  1905 ms -183.1%
       Split =  2109 ms -213.4%
  Linq.Count =  3797 ms -464.2%

更新:2021年12月,Visual Studio 2022, .NET 5和6

.NET 5
Starting test, 100000000 iterations set
(base) foreach =  7658 ms
fastest to slowest
  foreach Span =   3710 ms     51.6%
    Span [i--] =   3745 ms     51.1%
    Span [i++] =   3932 ms     48.7%
     for [i++] =   4593 ms     40.0%
     for [i--] =   7042 ms      8.0%
(base) foreach =   7658 ms      0.0%
       Replace =  18641 ms   -143.4%
         Split =  21469 ms   -180.3%
          Linq =  39726 ms   -418.8%
Regex Compiled = 128422 ms -1,577.0%
         Regex = 179603 ms -2,245.3%
         
         
.NET 6
Starting test, 100000000 iterations set
(base) foreach =  7343 ms
fastest to slowest
  foreach Span =   2918 ms     60.3%
     for [i++] =   2945 ms     59.9%
    Span [i++] =   3105 ms     57.7%
    Span [i--] =   5076 ms     30.9%
(base) foreach =   7343 ms      0.0%
     for [i--] =   8645 ms    -17.7%
       Replace =  18307 ms   -149.3%
         Split =  21440 ms   -192.0%
          Linq =  39354 ms   -435.9%
Regex Compiled = 114178 ms -1,454.9%
         Regex = 186493 ms -2,439.7%

我添加了更多的循环,并加入了RegEx,这样我们就可以看到在大量迭代中使用它是一场灾难。 我认为for(++)循环比较可能已经在。net 6中进行了优化,以便在内部使用Span -因为它与foreach Span的速度几乎相同。

代码链接


查找字符计数与查找字符串计数有很大不同。另外,这也取决于你是否想要检查不止一个。如果你想检查各种不同的字符计数,像这样的东西可以工作:

var charCounts =
   haystack
   .GroupBy(c => c)
   .ToDictionary(g => g.Key, g => g.Count());

var needleCount = charCounts.ContainsKey(needle) ? charCounts[needle] : 0;

注1:分组到字典中非常有用,因此为它编写GroupToDictionary扩展方法非常有意义。

注意2:拥有自己的字典实现也很有用,它允许默认值,然后您可以自动为不存在的键获取0。


Split (may)胜过IndexOf(用于字符串)。

上面的基准测试似乎表明Richard Watson是最快的字符串,这是错误的(可能差异来自我们的测试数据,但由于下面的原因,它看起来很奇怪)。

如果我们更深入地研究这些方法在.NET中的实现(对于Luke H, Richard Watson方法),

IndexOf取决于区域性,它将尝试检索/创建ReadOnlySpan,检查是否必须忽略大小写等。最后执行不安全/本机调用。 Split能够处理多个分隔符,并有一些StringSplitOptions 并且必须创建字符串[]数组并用分割结果填充它(所以做一些子字符串)。根据字符串出现的数量,Split可能比IndexOf更快。

顺便说一下,我做了一个简化版本的IndexOf(它可以更快,如果我使用指针和不安全,但不勾选应该是ok的大多数),它至少快了4个数量级。

基准测试(来源GitHub)

通过搜索一个常见的单词(the)或一个小句子 莎士比亚,理查三世。

Method Mean Error StdDev Ratio
Richard_LongInLong 67.721 us 1.0278 us 0.9614 us 1.00
Luke_LongInLong 1.960 us 0.0381 us 0.0637 us 0.03
Fab_LongInLong 1.198 us 0.0160 us 0.0142 us 0.02
-------------------- -----------: ----------: ----------: ------:
Richard_ShortInLong 104.771 us 2.8117 us 7.9304 us 1.00
Luke_ShortInLong 2.971 us 0.0594 us 0.0813 us 0.03
Fab_ShortInLong 2.206 us 0.0419 us 0.0411 us 0.02
--------------------- ----------: ---------: ---------: ------:
Richard_ShortInShort 115.53 ns 1.359 ns 1.135 ns 1.00
Luke_ShortInShort 52.46 ns 0.970 ns 0.908 ns 0.45
Fab_ShortInShort 28.47 ns 0.552 ns 0.542 ns 0.25
public int GetOccurrences(string input, string needle)
{
    int count = 0;
    unchecked
    {
        if (string.IsNullOrEmpty(input) || string.IsNullOrEmpty(needle))
        {
            return 0;
        }

        for (var i = 0; i < input.Length - needle.Length + 1; i++)
        {
            var c = input[i];
            if (c == needle[0])
            {
                for (var index = 0; index < needle.Length; index++)
                {
                    c = input[i + index];
                    var n = needle[index];

                    if (c != n)
                    {
                        break;
                    }
                    else if (index == needle.Length - 1)
                    {
                        count++;
                    }
                }
            }
        }
    }

    return count;
}

从。net 7开始,我们就有了不需要分配(并且高度优化)的Regex api。计数尤其容易和有效。

    var input = "abcd abcabc ababc";
    var result = Regex.Count(input: input, pattern: "abc"); // 4

当匹配动态模式时,记得转义它们:

public static int CountOccurences(string input, string pattern)
{
    pattern = Regex.Escape(pattern); // Aww, no way to avoid heap allocations here

    var result = Regex.Count(input: input, pattern: pattern);
    return result;
}

而且,作为固定模式的额外奖励,. net 7引入了分析器,帮助将正则表达式字符串转换为源代码生成的代码。这不仅避免了regex的运行时编译开销,而且还提供了非常可读的代码,展示了它是如何实现的。事实上,该代码通常至少与手动编写的任何替代代码一样有效。

如果您的正则表达式调用是合格的,分析程序将给出提示。简单地选择“转换为'GeneratedRegexAttribute '”并享受结果:

[GeneratedRegex("abc")]
private static partial Regex MyRegex(); // Go To Definition to see the generated code