我正在做一些事情,我意识到我想要在一个字符串中找到多少个/s,然后我突然想到,有几种方法可以做到这一点,但不能决定哪种是最好的(或最简单的)。

目前我想说的是:

string source = "/once/upon/a/time/";
int count = source.Length - source.Replace("/", "").Length;

但我一点都不喜欢,有人愿意吗?

我并不想为此挖掘出正则表达式,对吧?

我知道我的字符串将包含我要搜索的项,所以你可以假设…

当然对于长度为> 1的字符串,

string haystack = "/once/upon/a/time";
string needle = "/";
int needleCount = ( haystack.Length - haystack.Replace(needle,"").Length ) / needle.Length;

当前回答

编辑:

source.Split('/').Length-1

其他回答

从。Net 5 (Net core 2.1+和NetStandard 2.1)开始,我们有了一个新的迭代速度之王。

“跨度<T>”https://learn.microsoft.com/en-us/dotnet/api/system.span-1?view=net-5.0

String有一个内置成员,返回Span<Char>

int count = 0;
foreach( var c in source.AsSpan())
{
    if (c == '/')
        count++;
}

我的测试显示比直刺快62%我还比较了Span<T>[I]上的for()循环,以及这里发布的其他一些内容。注意,String上的反向for()迭代现在似乎比直接foreach运行得慢。

Starting test, 10000000 iterations
(base) foreach =   673 ms

fastest to slowest
foreach Span =   252 ms   62.6%
  Span [i--] =   282 ms   58.1%
  Span [i++] =   402 ms   40.3%
   for [i++] =   454 ms   32.5%
   for [i--] =   867 ms  -28.8%
     Replace =  1905 ms -183.1%
       Split =  2109 ms -213.4%
  Linq.Count =  3797 ms -464.2%

更新:2021年12月,Visual Studio 2022, .NET 5和6

.NET 5
Starting test, 100000000 iterations set
(base) foreach =  7658 ms
fastest to slowest
  foreach Span =   3710 ms     51.6%
    Span [i--] =   3745 ms     51.1%
    Span [i++] =   3932 ms     48.7%
     for [i++] =   4593 ms     40.0%
     for [i--] =   7042 ms      8.0%
(base) foreach =   7658 ms      0.0%
       Replace =  18641 ms   -143.4%
         Split =  21469 ms   -180.3%
          Linq =  39726 ms   -418.8%
Regex Compiled = 128422 ms -1,577.0%
         Regex = 179603 ms -2,245.3%
         
         
.NET 6
Starting test, 100000000 iterations set
(base) foreach =  7343 ms
fastest to slowest
  foreach Span =   2918 ms     60.3%
     for [i++] =   2945 ms     59.9%
    Span [i++] =   3105 ms     57.7%
    Span [i--] =   5076 ms     30.9%
(base) foreach =   7343 ms      0.0%
     for [i--] =   8645 ms    -17.7%
       Replace =  18307 ms   -149.3%
         Split =  21440 ms   -192.0%
          Linq =  39354 ms   -435.9%
Regex Compiled = 114178 ms -1,454.9%
         Regex = 186493 ms -2,439.7%

我添加了更多的循环,并加入了RegEx,这样我们就可以看到在大量迭代中使用它是一场灾难。 我认为for(++)循环比较可能已经在。net 6中进行了优化,以便在内部使用Span -因为它与foreach Span的速度几乎相同。

代码链接

我觉得我们缺少某些类型的子字符串计数,比如不安全的逐字节比较。我把原始海报的方法和我能想到的任何方法结合在一起。

这些是我做的字符串扩展。

namespace Example
{
    using System;
    using System.Text;

    public static class StringExtensions
    {
        public static int CountSubstr(this string str, string substr)
        {
            return (str.Length - str.Replace(substr, "").Length) / substr.Length;
        }

        public static int CountSubstr(this string str, char substr)
        {
            return (str.Length - str.Replace(substr.ToString(), "").Length);
        }

        public static int CountSubstr2(this string str, string substr)
        {
            int substrlen = substr.Length;
            int lastIndex = str.IndexOf(substr, 0, StringComparison.Ordinal);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + substrlen, StringComparison.Ordinal);
            }

            return count;
        }

        public static int CountSubstr2(this string str, char substr)
        {
            int lastIndex = str.IndexOf(substr, 0);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + 1);
            }

            return count;
        }

        public static int CountChar(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            for (int i = 0; i < length; ++i)
                if (str[i] == substr)
                    ++count;

            return count;
        }

        public static int CountChar2(this string str, char substr)
        {
            int count = 0;
            foreach (var c in str)
                if (c == substr)
                    ++count;

            return count;
        }

        public static unsafe int CountChar3(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = 0; i < length; ++i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountChar4(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = length - 1; i >= 0; --i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountSubstr3(this string str, string substr)
        {
            int length = str.Length;
            int substrlen = substr.Length;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = 0;

                    for (int i = 0; i < length; ++i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            ++n;
                            if (n == substrlen)
                            {
                                ++count;
                                n = 0;
                            }
                        }
                        else
                            n = 0;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr3(this string str, char substr)
        {
            return CountSubstr3(str, substr.ToString());
        }

        public static unsafe int CountSubstr4(this string str, string substr)
        {
            int length = str.Length;
            int substrLastIndex = substr.Length - 1;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = substrLastIndex;

                    for (int i = length - 1; i >= 0; --i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            if (--n == -1)
                            {
                                ++count;
                                n = substrLastIndex;
                            }
                        }
                        else
                            n = substrLastIndex;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr4(this string str, char substr)
        {
            return CountSubstr4(str, substr.ToString());
        }
    }
}

接下来是测试代码…

static void Main()
{
    const char matchA = '_';
    const string matchB = "and";
    const string matchC = "muchlongerword";
    const string testStrA = "_and_d_e_banna_i_o___pfasd__and_d_e_banna_i_o___pfasd_";
    const string testStrB = "and sdf and ans andeians andano ip and and sdf and ans andeians andano ip and";
    const string testStrC =
        "muchlongerword amuchlongerworsdfmuchlongerwordsdf jmuchlongerworijv muchlongerword sdmuchlongerword dsmuchlongerword";
    const int testSize = 1000000;
    Console.WriteLine(testStrA.CountSubstr('_'));
    Console.WriteLine(testStrA.CountSubstr2('_'));
    Console.WriteLine(testStrA.CountSubstr3('_'));
    Console.WriteLine(testStrA.CountSubstr4('_'));
    Console.WriteLine(testStrA.CountChar('_'));
    Console.WriteLine(testStrA.CountChar2('_'));
    Console.WriteLine(testStrA.CountChar3('_'));
    Console.WriteLine(testStrA.CountChar4('_'));
    Console.WriteLine(testStrB.CountSubstr("and"));
    Console.WriteLine(testStrB.CountSubstr2("and"));
    Console.WriteLine(testStrB.CountSubstr3("and"));
    Console.WriteLine(testStrB.CountSubstr4("and"));
    Console.WriteLine(testStrC.CountSubstr("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr2("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr3("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr4("muchlongerword"));
    var timer = new Stopwatch();
    timer.Start();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr(matchA);
    timer.Stop();
    Console.WriteLine("CS1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr(matchB);
    timer.Stop();
    Console.WriteLine("CS1 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr(matchC);
    timer.Stop();
    Console.WriteLine("CS1 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr2(matchA);
    timer.Stop();
    Console.WriteLine("CS2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr2(matchB);
    timer.Stop();
    Console.WriteLine("CS2 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr2(matchC);
    timer.Stop();
    Console.WriteLine("CS2 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr3(matchA);
    timer.Stop();
    Console.WriteLine("CS3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr3(matchB);
    timer.Stop();
    Console.WriteLine("CS3 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr3(matchC);
    timer.Stop();
    Console.WriteLine("CS3 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr4(matchA);
    timer.Stop();
    Console.WriteLine("CS4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr4(matchB);
    timer.Stop();
    Console.WriteLine("CS4 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr4(matchC);
    timer.Stop();
    Console.WriteLine("CS4 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar(matchA);
    timer.Stop();
    Console.WriteLine("CC1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar2(matchA);
    timer.Stop();
    Console.WriteLine("CC2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar3(matchA);
    timer.Stop();
    Console.WriteLine("CC3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar4(matchA);
    timer.Stop();
    Console.WriteLine("CC4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
}

结果:CSX与CountSubstrX对应,CCX与CountCharX对应。“chr”搜索字符串中的“_”,“and”搜索字符串中的“and”,“mlw”搜索字符串中的“muchlongerword”

CS1 chr: 824.123ms
CS1 and: 586.1893ms
CS1 mlw: 486.5414ms
CS2 chr: 127.8941ms
CS2 and: 806.3918ms
CS2 mlw: 497.318ms
CS3 chr: 201.8896ms
CS3 and: 124.0675ms
CS3 mlw: 212.8341ms
CS4 chr: 81.5183ms
CS4 and: 92.0615ms
CS4 mlw: 116.2197ms
CC1 chr: 66.4078ms
CC2 chr: 64.0161ms
CC3 chr: 65.9013ms
CC4 chr: 65.8206ms

最后,我有了一个包含360万个字符的文件。“derp adfderdserp dfaerpderp deasderp”重复了10万次。我用上述方法在文件中搜索“derp”100次,得到这些结果。

CS1Derp: 1501.3444ms
CS2Derp: 1585.797ms
CS3Derp: 376.0937ms
CS4Derp: 271.1663ms

所以我的第四种方法肯定是赢家,但是,实际上,如果一个360万个字符的文件100次只需要1586ms作为最坏的情况,那么所有这些都是可以忽略不计的。

顺便说一下,我还用100次CountSubstr和CountChar方法在360万个字符的文件中扫描了'd'字符。结果……

CS1  d : 2606.9513ms
CS2  d : 339.7942ms
CS3  d : 960.281ms
CS4  d : 233.3442ms
CC1  d : 302.4122ms
CC2  d : 280.7719ms
CC3  d : 299.1125ms
CC4  d : 292.9365ms

原来海报的方法是非常糟糕的单个字符针在一个大草堆根据这一点。

注:所有值更新为发布版本输出。在我第一次发布这篇文章时,我不小心忘记了建立发布模式。我的一些声明已经被修改了。

LINQ适用于所有的集合,因为字符串只是字符的集合,那么下面这个漂亮的小语句怎么样:

var count = source.Count(c => c == '/');

确保你使用了system。linq;在代码文件的顶部,因为. count是来自该名称空间的扩展方法。

Regex.Matches(input,  Regex.Escape("stringToMatch")).Count

在c#中,一个很好的字符串子字符串计数器是这样的:

public static int CCount(String haystack, String needle)
{
    return haystack.Split(new[] { needle }, StringSplitOptions.None).Length - 1;
}