我怎样才能做得快呢?

当然我可以这样做:

static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
    if (a1.Length != a2.Length)
        return false;

    for (int i=0; i<a1.Length; i++)
        if (a1[i]!=a2[i])
            return false;

    return true;
}

但我正在寻找一个BCL函数或一些高度优化的已证明的方法来做到这一点。

java.util.Arrays.equals((sbyte[])(Array)a1, (sbyte[])(Array)a2);

工作得很好,但这似乎不适用于x64。

注意我的快速回答。


当前回答

看看。net是如何处理字符串的。Equals,你可以看到它使用了一个叫做EqualsHelper的私有方法,它有一个“不安全”的指针实现。net Reflector是你的朋友,可以看到内部是如何完成的。

这可以用作字节数组比较的模板,我在博客文章中用c#快速字节数组比较中做了一个实现。我还做了一些基本的基准测试,看看什么时候安全的实现比不安全的实现更快。

也就是说,除非你真的需要杀手级的性能,否则我会选择简单的fr循环比较。

其他回答

我想到了许多显卡内置的块传输加速方法。但是这样你就必须按字节复制所有的数据,所以如果你不想在非托管和依赖硬件的代码中实现你的整个逻辑,这对你没有多大帮助……

Another way of optimization similar to the approach shown above would be to store as much of your data as possible in a long[] rather than a byte[] right from the start, for example if you are reading it sequentially from a binary file, or if you use a memory mapped file, read in data as long[] or single long values. Then, your comparison loop will only need 1/8th of the number of iterations it would have to do for a byte[] containing the same amount of data. It is a matter of when and how often you need to compare vs. when and how often you need to access the data in a byte-by-byte manner, e.g. to use it in an API call as a parameter in a method that expects a byte[]. In the end, you only can tell if you really know the use case...

Span<T>提供了一个极具竞争力的替代方案,而不必在您自己的应用程序的代码库中添加令人困惑和/或不可移植的错误:

// byte[] is implicitly convertible to ReadOnlySpan<byte>
static bool ByteArrayCompare(ReadOnlySpan<byte> a1, ReadOnlySpan<byte> a2)
{
    return a1.SequenceEqual(a2);
}

. net 6.0.4的实现可以在这里找到。

我已经修改了@EliArbel的要点,将这个方法添加为SpansEqual,在其他人的基准测试中删除大多数不太有趣的性能,使用不同的数组大小运行它,输出图形,并将SpansEqual标记为基线,以便它报告不同的方法与SpansEqual相比如何。

以下数字来自结果,经过轻微编辑以删除“错误”一栏。

|        Method |  ByteCount |               Mean |          StdDev | Ratio | RatioSD |
|-------------- |----------- |-------------------:|----------------:|------:|--------:|
|    SpansEqual |         15 |           2.074 ns |       0.0233 ns |  1.00 |    0.00 |
|  LongPointers |         15 |           2.854 ns |       0.0632 ns |  1.38 |    0.03 |
|      Unrolled |         15 |          12.449 ns |       0.2487 ns |  6.00 |    0.13 |
| PInvokeMemcmp |         15 |           7.525 ns |       0.1057 ns |  3.63 |    0.06 |
|               |            |                    |                 |       |         |
|    SpansEqual |       1026 |          15.629 ns |       0.1712 ns |  1.00 |    0.00 |
|  LongPointers |       1026 |          46.487 ns |       0.2938 ns |  2.98 |    0.04 |
|      Unrolled |       1026 |          23.786 ns |       0.1044 ns |  1.52 |    0.02 |
| PInvokeMemcmp |       1026 |          28.299 ns |       0.2781 ns |  1.81 |    0.03 |
|               |            |                    |                 |       |         |
|    SpansEqual |    1048585 |      17,920.329 ns |     153.0750 ns |  1.00 |    0.00 |
|  LongPointers |    1048585 |      42,077.448 ns |     309.9067 ns |  2.35 |    0.02 |
|      Unrolled |    1048585 |      29,084.901 ns |     428.8496 ns |  1.62 |    0.03 |
| PInvokeMemcmp |    1048585 |      30,847.572 ns |     213.3162 ns |  1.72 |    0.02 |
|               |            |                    |                 |       |         |
|    SpansEqual | 2147483591 | 124,752,376.667 ns | 552,281.0202 ns |  1.00 |    0.00 |
|  LongPointers | 2147483591 | 139,477,269.231 ns | 331,458.5429 ns |  1.12 |    0.00 |
|      Unrolled | 2147483591 | 137,617,423.077 ns | 238,349.5093 ns |  1.10 |    0.00 |
| PInvokeMemcmp | 2147483591 | 138,373,253.846 ns | 288,447.8278 ns |  1.11 |    0.01 |

我很惊讶地看到SpansEqual没有在max-array-size方法中名列前茅,但差异是如此之小,我认为这不会有什么影响。在更新到。net 6.0.4和我的新硬件上运行后,SpansEqual现在在所有数组大小上都轻松优于其他所有数组。

我的系统信息:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
  DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT

我使用附带的。net 4.7发布版本做了一些测量,没有附带调试器。我认为人们一直在使用错误的度量,因为如果你关心这里的速度,你所关心的是计算两个字节数组是否相等需要多长时间。即以字节为单位的吞吐量。

StructuralComparison :              4.6 MiB/s
for                  :            274.5 MiB/s
ToUInt32             :            263.6 MiB/s
ToUInt64             :            474.9 MiB/s
memcmp               :           8500.8 MiB/s

正如你所看到的,没有比memcmp更好的方法了,而且它快了几个数量级。简单的for循环是次优选择。我仍然不明白为什么微软不能简单地包含一个缓冲区。比较方法。

[Program.cs]:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;

namespace memcmp
{
    class Program
    {
        static byte[] TestVector(int size)
        {
            var data = new byte[size];
            using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider())
            {
                rng.GetBytes(data);
            }
            return data;
        }

        static TimeSpan Measure(string testCase, TimeSpan offset, Action action, bool ignore = false)
        {
            var t = Stopwatch.StartNew();
            var n = 0L;
            while (t.Elapsed < TimeSpan.FromSeconds(10))
            {
                action();
                n++;
            }
            var elapsed = t.Elapsed - offset;
            if (!ignore)
            {
                Console.WriteLine($"{testCase,-16} : {n / elapsed.TotalSeconds,16:0.0} MiB/s");
            }
            return elapsed;
        }

        [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
        static extern int memcmp(byte[] b1, byte[] b2, long count);

        static void Main(string[] args)
        {
            // how quickly can we establish if two sequences of bytes are equal?

            // note that we are testing the speed of different comparsion methods

            var a = TestVector(1024 * 1024); // 1 MiB
            var b = (byte[])a.Clone();

            // was meant to offset the overhead of everything but copying but my attempt was a horrible mistake... should have reacted sooner due to the initially ridiculous throughput values...
            // Measure("offset", new TimeSpan(), () => { return; }, ignore: true);
            var offset = TimeZone.Zero

            Measure("StructuralComparison", offset, () =>
            {
                StructuralComparisons.StructuralEqualityComparer.Equals(a, b);
            });

            Measure("for", offset, () =>
            {
                for (int i = 0; i < a.Length; i++)
                {
                    if (a[i] != b[i]) break;
                }
            });

            Measure("ToUInt32", offset, () =>
            {
                for (int i = 0; i < a.Length; i += 4)
                {
                    if (BitConverter.ToUInt32(a, i) != BitConverter.ToUInt32(b, i)) break;
                }
            });

            Measure("ToUInt64", offset, () =>
            {
                for (int i = 0; i < a.Length; i += 8)
                {
                    if (BitConverter.ToUInt64(a, i) != BitConverter.ToUInt64(b, i)) break;
                }
            });

            Measure("memcmp", offset, () =>
            {
                memcmp(a, b, a.Length);
            });
        }
    }
}

这与其他方法类似,但这里的不同之处在于,不存在我可以一次检查的下一个最高字节数,例如,如果我有63个字节(在我的SIMD示例中),我可以检查前32个字节的相等性,然后是后32个字节,这比检查32个字节、16个字节、8个字节等等要快。您输入的第一个检查是比较所有字节所需要的唯一检查。

这确实在我的测试中名列前茅,但仅以微弱之差。

下面的代码正是我在airbreather/ArrayComparePerf.cs中测试它的方式。

public unsafe bool SIMDNoFallThrough()    #requires  System.Runtime.Intrinsics.X86
{
    if (a1 == null || a2 == null)
        return false;

    int length0 = a1.Length;

    if (length0 != a2.Length) return false;

    fixed (byte* b00 = a1, b01 = a2)
    {
        byte* b0 = b00, b1 = b01, last0 = b0 + length0, last1 = b1 + length0, last32 = last0 - 31;

        if (length0 > 31)
        {
            while (b0 < last32)
            {
                if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != -1)
                    return false;
                b0 += 32;
                b1 += 32;
            }
            return Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(last0 - 32), Avx.LoadVector256(last1 - 32))) == -1;
        }

        if (length0 > 15)
        {
            if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != 65535)
                return false;
            return Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(last0 - 16), Sse2.LoadVector128(last1 - 16))) == 65535;
        }

        if (length0 > 7)
        {
            if (*(ulong*)b0 != *(ulong*)b1)
                return false;
            return *(ulong*)(last0 - 8) == *(ulong*)(last1 - 8);
        }

        if (length0 > 3)
        {
            if (*(uint*)b0 != *(uint*)b1)
                return false;
            return *(uint*)(last0 - 4) == *(uint*)(last1 - 4);
        }

        if (length0 > 1)
        {
            if (*(ushort*)b0 != *(ushort*)b1)
                return false;
            return *(ushort*)(last0 - 2) == *(ushort*)(last1 - 2);
        }

        return *b0 == *b1;
    }
}

如果没有首选的SIMD,与现有的longpointer算法相同的方法:

public unsafe bool LongPointersNoFallThrough()
{
    if (a1 == null || a2 == null || a1.Length != a2.Length)
        return false;
    fixed (byte* p1 = a1, p2 = a2)
    {
        byte* x1 = p1, x2 = p2;
        int l = a1.Length;
        if ((l & 8) != 0)
        {
            for (int i = 0; i < l / 8; i++, x1 += 8, x2 += 8)
                if (*(long*)x1 != *(long*)x2) return false;
            return *(long*)(x1 + (l - 8)) == *(long*)(x2 + (l - 8));
        }
        if ((l & 4) != 0)
        {
            if (*(int*)x1 != *(int*)x2) return false; x1 += 4; x2 += 4;
            return *(int*)(x1 + (l - 4)) == *(int*)(x2 + (l - 4));
        }
        if ((l & 2) != 0)
        {
            if (*(short*)x1 != *(short*)x2) return false; x1 += 2; x2 += 2;
            return *(short*)(x1 + (l - 2)) == *(short*)(x2 + (l - 2));
        }
        return *x1 == *x2;
    }
}

简单的回答是:

    public bool Compare(byte[] b1, byte[] b2)
    {
        return Encoding.ASCII.GetString(b1) == Encoding.ASCII.GetString(b2);
    }

通过这种方式,您可以使用优化的. net字符串比较来进行字节数组比较,而不需要编写不安全的代码。这是它如何在后台完成的:

private unsafe static bool EqualsHelper(String strA, String strB)
{
    Contract.Requires(strA != null);
    Contract.Requires(strB != null);
    Contract.Requires(strA.Length == strB.Length);

    int length = strA.Length;

    fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
    {
        char* a = ap;
        char* b = bp;

        // Unroll the loop

        #if AMD64
            // For the AMD64 bit platform we unroll by 12 and
            // check three qwords at a time. This is less code
            // than the 32 bit case and is shorter
            // pathlength.

            while (length >= 12)
            {
                if (*(long*)a     != *(long*)b)     return false;
                if (*(long*)(a+4) != *(long*)(b+4)) return false;
                if (*(long*)(a+8) != *(long*)(b+8)) return false;
                a += 12; b += 12; length -= 12;
            }
       #else
           while (length >= 10)
           {
               if (*(int*)a != *(int*)b) return false;
               if (*(int*)(a+2) != *(int*)(b+2)) return false;
               if (*(int*)(a+4) != *(int*)(b+4)) return false;
               if (*(int*)(a+6) != *(int*)(b+6)) return false;
               if (*(int*)(a+8) != *(int*)(b+8)) return false;
               a += 10; b += 10; length -= 10;
           }
       #endif

        // This depends on the fact that the String objects are
        // always zero terminated and that the terminating zero is not included
        // in the length. For odd string sizes, the last compare will include
        // the zero terminator.
        while (length > 0)
        {
            if (*(int*)a != *(int*)b) break;
            a += 2; b += 2; length -= 2;
        }

        return (length <= 0);
    }
}