我正在做一些事情,我意识到我想要在一个字符串中找到多少个/s,然后我突然想到,有几种方法可以做到这一点,但不能决定哪种是最好的(或最简单的)。

目前我想说的是:

string source = "/once/upon/a/time/";
int count = source.Length - source.Replace("/", "").Length;

但我一点都不喜欢,有人愿意吗?

我并不想为此挖掘出正则表达式,对吧?

我知道我的字符串将包含我要搜索的项,所以你可以假设…

当然对于长度为> 1的字符串,

string haystack = "/once/upon/a/time";
string needle = "/";
int needleCount = ( haystack.Length - haystack.Replace(needle,"").Length ) / needle.Length;

当前回答

我认为最简单的方法是使用正则表达式。通过这种方式,你可以获得与使用myVar.Split('x')相同的分割计数,但在多个字符设置中。

string myVar = "do this to count the number of words in my wording so that I can word it up!";
int count = Regex.Split(myVar, "word").Length;

其他回答

str="aaabbbbjjja";
int count = 0;
int size = str.Length;

string[] strarray = new string[size];
for (int i = 0; i < str.Length; i++)
{
    strarray[i] = str.Substring(i, 1);
}
Array.Sort(strarray);
str = "";
for (int i = 0; i < strarray.Length - 1; i++)
{

    if (strarray[i] == strarray[i + 1])
    {

        count++;
    }
    else
    {
        count++;
        str = str + strarray[i] + count;
        count = 0;
    }

}
count++;
str = str + strarray[strarray.Length - 1] + count;

这是用来计算字符出现次数的。本例的输出为"a4b4j3"

string Name = "Very good nice one is very good but is very good nice one this is called the term";
bool valid=true;
int count = 0;
int k=0;
int m = 0;
while (valid)
{
    k = Name.Substring(m,Name.Length-m).IndexOf("good");
    if (k != -1)
    {
        count++;
        m = m + k + 4;
    }
    else
        valid = false;
}
Console.WriteLine(count + " Times accures");

如果你想搜索整个字符串,而不仅仅是字符:

src.Select((c, i) => src.Substring(i))
    .Count(sub => sub.StartsWith(target))

读为“对于字符串中的每个字符,将该字符开始的字符串的其余部分作为子字符串;如果它以目标字符串开头,就数一数。”

从。net 7开始,我们就有了不需要分配(并且高度优化)的Regex api。计数尤其容易和有效。

    var input = "abcd abcabc ababc";
    var result = Regex.Count(input: input, pattern: "abc"); // 4

当匹配动态模式时,记得转义它们:

public static int CountOccurences(string input, string pattern)
{
    pattern = Regex.Escape(pattern); // Aww, no way to avoid heap allocations here

    var result = Regex.Count(input: input, pattern: pattern);
    return result;
}

而且,作为固定模式的额外奖励,. net 7引入了分析器,帮助将正则表达式字符串转换为源代码生成的代码。这不仅避免了regex的运行时编译开销,而且还提供了非常可读的代码,展示了它是如何实现的。事实上,该代码通常至少与手动编写的任何替代代码一样有效。

如果您的正则表达式调用是合格的,分析程序将给出提示。简单地选择“转换为'GeneratedRegexAttribute '”并享受结果:

[GeneratedRegex("abc")]
private static partial Regex MyRegex(); // Go To Definition to see the generated code

我觉得我们缺少某些类型的子字符串计数,比如不安全的逐字节比较。我把原始海报的方法和我能想到的任何方法结合在一起。

这些是我做的字符串扩展。

namespace Example
{
    using System;
    using System.Text;

    public static class StringExtensions
    {
        public static int CountSubstr(this string str, string substr)
        {
            return (str.Length - str.Replace(substr, "").Length) / substr.Length;
        }

        public static int CountSubstr(this string str, char substr)
        {
            return (str.Length - str.Replace(substr.ToString(), "").Length);
        }

        public static int CountSubstr2(this string str, string substr)
        {
            int substrlen = substr.Length;
            int lastIndex = str.IndexOf(substr, 0, StringComparison.Ordinal);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + substrlen, StringComparison.Ordinal);
            }

            return count;
        }

        public static int CountSubstr2(this string str, char substr)
        {
            int lastIndex = str.IndexOf(substr, 0);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + 1);
            }

            return count;
        }

        public static int CountChar(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            for (int i = 0; i < length; ++i)
                if (str[i] == substr)
                    ++count;

            return count;
        }

        public static int CountChar2(this string str, char substr)
        {
            int count = 0;
            foreach (var c in str)
                if (c == substr)
                    ++count;

            return count;
        }

        public static unsafe int CountChar3(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = 0; i < length; ++i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountChar4(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = length - 1; i >= 0; --i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountSubstr3(this string str, string substr)
        {
            int length = str.Length;
            int substrlen = substr.Length;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = 0;

                    for (int i = 0; i < length; ++i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            ++n;
                            if (n == substrlen)
                            {
                                ++count;
                                n = 0;
                            }
                        }
                        else
                            n = 0;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr3(this string str, char substr)
        {
            return CountSubstr3(str, substr.ToString());
        }

        public static unsafe int CountSubstr4(this string str, string substr)
        {
            int length = str.Length;
            int substrLastIndex = substr.Length - 1;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = substrLastIndex;

                    for (int i = length - 1; i >= 0; --i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            if (--n == -1)
                            {
                                ++count;
                                n = substrLastIndex;
                            }
                        }
                        else
                            n = substrLastIndex;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr4(this string str, char substr)
        {
            return CountSubstr4(str, substr.ToString());
        }
    }
}

接下来是测试代码…

static void Main()
{
    const char matchA = '_';
    const string matchB = "and";
    const string matchC = "muchlongerword";
    const string testStrA = "_and_d_e_banna_i_o___pfasd__and_d_e_banna_i_o___pfasd_";
    const string testStrB = "and sdf and ans andeians andano ip and and sdf and ans andeians andano ip and";
    const string testStrC =
        "muchlongerword amuchlongerworsdfmuchlongerwordsdf jmuchlongerworijv muchlongerword sdmuchlongerword dsmuchlongerword";
    const int testSize = 1000000;
    Console.WriteLine(testStrA.CountSubstr('_'));
    Console.WriteLine(testStrA.CountSubstr2('_'));
    Console.WriteLine(testStrA.CountSubstr3('_'));
    Console.WriteLine(testStrA.CountSubstr4('_'));
    Console.WriteLine(testStrA.CountChar('_'));
    Console.WriteLine(testStrA.CountChar2('_'));
    Console.WriteLine(testStrA.CountChar3('_'));
    Console.WriteLine(testStrA.CountChar4('_'));
    Console.WriteLine(testStrB.CountSubstr("and"));
    Console.WriteLine(testStrB.CountSubstr2("and"));
    Console.WriteLine(testStrB.CountSubstr3("and"));
    Console.WriteLine(testStrB.CountSubstr4("and"));
    Console.WriteLine(testStrC.CountSubstr("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr2("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr3("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr4("muchlongerword"));
    var timer = new Stopwatch();
    timer.Start();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr(matchA);
    timer.Stop();
    Console.WriteLine("CS1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr(matchB);
    timer.Stop();
    Console.WriteLine("CS1 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr(matchC);
    timer.Stop();
    Console.WriteLine("CS1 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr2(matchA);
    timer.Stop();
    Console.WriteLine("CS2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr2(matchB);
    timer.Stop();
    Console.WriteLine("CS2 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr2(matchC);
    timer.Stop();
    Console.WriteLine("CS2 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr3(matchA);
    timer.Stop();
    Console.WriteLine("CS3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr3(matchB);
    timer.Stop();
    Console.WriteLine("CS3 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr3(matchC);
    timer.Stop();
    Console.WriteLine("CS3 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr4(matchA);
    timer.Stop();
    Console.WriteLine("CS4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr4(matchB);
    timer.Stop();
    Console.WriteLine("CS4 and: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr4(matchC);
    timer.Stop();
    Console.WriteLine("CS4 mlw: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar(matchA);
    timer.Stop();
    Console.WriteLine("CC1 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar2(matchA);
    timer.Stop();
    Console.WriteLine("CC2 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar3(matchA);
    timer.Stop();
    Console.WriteLine("CC3 chr: " + timer.Elapsed.TotalMilliseconds + "ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar4(matchA);
    timer.Stop();
    Console.WriteLine("CC4 chr: " + timer.Elapsed.TotalMilliseconds + "ms");
}

结果:CSX与CountSubstrX对应,CCX与CountCharX对应。“chr”搜索字符串中的“_”,“and”搜索字符串中的“and”,“mlw”搜索字符串中的“muchlongerword”

CS1 chr: 824.123ms
CS1 and: 586.1893ms
CS1 mlw: 486.5414ms
CS2 chr: 127.8941ms
CS2 and: 806.3918ms
CS2 mlw: 497.318ms
CS3 chr: 201.8896ms
CS3 and: 124.0675ms
CS3 mlw: 212.8341ms
CS4 chr: 81.5183ms
CS4 and: 92.0615ms
CS4 mlw: 116.2197ms
CC1 chr: 66.4078ms
CC2 chr: 64.0161ms
CC3 chr: 65.9013ms
CC4 chr: 65.8206ms

最后,我有了一个包含360万个字符的文件。“derp adfderdserp dfaerpderp deasderp”重复了10万次。我用上述方法在文件中搜索“derp”100次,得到这些结果。

CS1Derp: 1501.3444ms
CS2Derp: 1585.797ms
CS3Derp: 376.0937ms
CS4Derp: 271.1663ms

所以我的第四种方法肯定是赢家,但是,实际上,如果一个360万个字符的文件100次只需要1586ms作为最坏的情况,那么所有这些都是可以忽略不计的。

顺便说一下,我还用100次CountSubstr和CountChar方法在360万个字符的文件中扫描了'd'字符。结果……

CS1  d : 2606.9513ms
CS2  d : 339.7942ms
CS3  d : 960.281ms
CS4  d : 233.3442ms
CC1  d : 302.4122ms
CC2  d : 280.7719ms
CC3  d : 299.1125ms
CC4  d : 292.9365ms

原来海报的方法是非常糟糕的单个字符针在一个大草堆根据这一点。

注:所有值更新为发布版本输出。在我第一次发布这篇文章时,我不小心忘记了建立发布模式。我的一些声明已经被修改了。