如何计算字符串(实际上是一个字符)在字符串中的出现次数?

我正在做一些事情，我意识到我想要在一个字符串中找到多少个/s，然后我突然想到，有几种方法可以做到这一点，但不能决定哪种是最好的(或最简单的)。

目前我想说的是:

string source = "/once/upon/a/time/";
int count = source.Length - source.Replace("/", "").Length;

但我一点都不喜欢，有人愿意吗?

我并不想为此挖掘出正则表达式，对吧?

我知道我的字符串将包含我要搜索的项，所以你可以假设…

当然对于长度为> 1的字符串，

string haystack = "/once/upon/a/time";
string needle = "/";
int needleCount = ( haystack.Length - haystack.Replace(needle,"").Length ) / needle.Length;

当前回答

从。Net 5 (Net core 2.1+和NetStandard 2.1)开始，我们有了一个新的迭代速度之王。

“跨度<T>”https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=net-5.0

String有一个内置成员，返回Span<Char>

int count = 0;
foreach( var c in source.AsSpan())
{
    if (c == '/')
        count++;
}

我的测试显示比直刺快62%我还比较了Span<T>[I]上的for()循环，以及这里发布的其他一些内容。注意，String上的反向for()迭代现在似乎比直接foreach运行得慢。

Starting test, 10000000 iterations
(base) foreach =   673 ms

fastest to slowest
foreach Span =   252 ms   62.6%
  Span [i--] =   282 ms   58.1%
  Span [i++] =   402 ms   40.3%
   for [i++] =   454 ms   32.5%
   for [i--] =   867 ms  -28.8%
     Replace =  1905 ms -183.1%
       Split =  2109 ms -213.4%
  Linq.Count =  3797 ms -464.2%

更新:2021年12月，Visual Studio 2022， .NET 5和6

.NET 5
Starting test, 100000000 iterations set
(base) foreach =  7658 ms
fastest to slowest
  foreach Span =   3710 ms     51.6%
    Span [i--] =   3745 ms     51.1%
    Span [i++] =   3932 ms     48.7%
     for [i++] =   4593 ms     40.0%
     for [i--] =   7042 ms      8.0%
(base) foreach =   7658 ms      0.0%
       Replace =  18641 ms   -143.4%
         Split =  21469 ms   -180.3%
          Linq =  39726 ms   -418.8%
Regex Compiled = 128422 ms -1,577.0%
         Regex = 179603 ms -2,245.3%
         
         
.NET 6
Starting test, 100000000 iterations set
(base) foreach =  7343 ms
fastest to slowest
  foreach Span =   2918 ms     60.3%
     for [i++] =   2945 ms     59.9%
    Span [i++] =   3105 ms     57.7%
    Span [i--] =   5076 ms     30.9%
(base) foreach =   7343 ms      0.0%
     for [i--] =   8645 ms    -17.7%
       Replace =  18307 ms   -149.3%
         Split =  21440 ms   -192.0%
          Linq =  39354 ms   -435.9%
Regex Compiled = 114178 ms -1,454.9%
         Regex = 186493 ms -2,439.7%

我添加了更多的循环，并加入了RegEx，这样我们就可以看到在大量迭代中使用它是一场灾难。我认为for(++)循环比较可能已经在。net 6中进行了优化，以便在内部使用Span -因为它与foreach Span的速度几乎相同。

代码链接

2021-07-14 19:09:33

其他回答

private int CountWords(string text, string word) {
    int count = (text.Length - text.Replace(word, "").Length) / word.Length;
    return count;
}

因为最初的解决方案，是最快的字符，我想它也将是字符串。这是我的贡献。

上下文:我在日志文件中寻找像“失败”和“成功”这样的词。

克我

2011-03-15 16:36:13

int count = new Regex(Regex.Escape(needle)).Matches(haystack).Count;

2010-12-10 15:54:43

Split (may)胜过IndexOf(用于字符串)。

上面的基准测试似乎表明Richard Watson是最快的字符串，这是错误的(可能差异来自我们的测试数据，但由于下面的原因，它看起来很奇怪)。

如果我们更深入地研究这些方法在.NET中的实现(对于Luke H, Richard Watson方法)，

IndexOf取决于区域性，它将尝试检索/创建ReadOnlySpan，检查是否必须忽略大小写等。最后执行不安全/本机调用。 Split能够处理多个分隔符，并有一些StringSplitOptions 并且必须创建字符串[]数组并用分割结果填充它(所以做一些子字符串)。根据字符串出现的数量，Split可能比IndexOf更快。

顺便说一下，我做了一个简化版本的IndexOf(它可以更快，如果我使用指针和不安全，但不勾选应该是ok的大多数)，它至少快了4个数量级。

基准测试(来源GitHub)

通过搜索一个常见的单词(the)或一个小句子莎士比亚，理查三世。

Method	Mean	Error	StdDev	Ratio
Richard_LongInLong	67.721 us	1.0278 us	0.9614 us	1.00
Luke_LongInLong	1.960 us	0.0381 us	0.0637 us	0.03
Fab_LongInLong	1.198 us	0.0160 us	0.0142 us	0.02
--------------------	-----------:	----------:	----------:	------:
Richard_ShortInLong	104.771 us	2.8117 us	7.9304 us	1.00
Luke_ShortInLong	2.971 us	0.0594 us	0.0813 us	0.03
Fab_ShortInLong	2.206 us	0.0419 us	0.0411 us	0.02
---------------------	----------:	---------:	---------:	------:
Richard_ShortInShort	115.53 ns	1.359 ns	1.135 ns	1.00
Luke_ShortInShort	52.46 ns	0.970 ns	0.908 ns	0.45
Fab_ShortInShort	28.47 ns	0.552 ns	0.542 ns	0.25

public int GetOccurrences(string input, string needle)
{
    int count = 0;
    unchecked
    {
        if (string.IsNullOrEmpty(input) || string.IsNullOrEmpty(needle))
        {
            return 0;
        }

        for (var i = 0; i < input.Length - needle.Length + 1; i++)
        {
            var c = input[i];
            if (c == needle[0])
            {
                for (var index = 0; index < needle.Length; index++)
                {
                    c = input[i + index];
                    var n = needle[index];

                    if (c != n)
                    {
                        break;
                    }
                    else if (index == needle.Length - 1)
                    {
                        count++;
                    }
                }
            }
        }
    }

    return count;
}

2022-11-23 11:32:43

我认为最简单的方法是使用正则表达式。通过这种方式，你可以获得与使用myVar.Split('x')相同的分割计数，但在多个字符设置中。

string myVar = "do this to count the number of words in my wording so that I can word it up!";
int count = Regex.Split(myVar, "word").Length;

2013-05-01 16:51:50

Regex.Matches(input,  Regex.Escape("stringToMatch")).Count

2013-06-19 10:49:27

如何计算字符串(实际上是一个字符)在字符串中的出现次数?

推荐文章

最新文章

标签