到目前为止,我已经避免了测试多线程代码的噩梦,因为它似乎是一个太大的雷区。我想知道人们是如何测试依赖于线程的代码以获得成功执行的,或者人们是如何测试那些仅在两个线程以给定方式交互时才会出现的问题的?
对于今天的程序员来说,这似乎是一个非常关键的问题,恕我直言,将我们的知识集中在这个问题上是很有用的。
到目前为止,我已经避免了测试多线程代码的噩梦,因为它似乎是一个太大的雷区。我想知道人们是如何测试依赖于线程的代码以获得成功执行的,或者人们是如何测试那些仅在两个线程以给定方式交互时才会出现的问题的?
对于今天的程序员来说,这似乎是一个非常关键的问题,恕我直言,将我们的知识集中在这个问题上是很有用的。
当前回答
假设在“多线程”代码下是指某些东西
有状态和可变的 由多个线程访问/修改 同时
换句话说,我们讨论的是测试自定义的有状态线程安全类/方法/单元——这应该是当今非常罕见的野兽。
因为这个野兽很罕见,首先我们需要确保有充分的理由来写它。
步骤1。考虑在相同的同步上下文中修改状态。
现在很容易编写可组合的并发和异步代码,其中IO或其他慢操作卸载到后台,但共享状态在一个同步上下文中更新和查询。例如,async/await任务和。net中的Rx等等——它们都是可测试的设计,“真正的”任务和调度程序可以被取代,以使测试具有确定性(但这超出了问题的范围)。
这听起来可能很有限,但这种方法效果惊人。以这种风格编写整个应用程序是可能的,而不需要使任何状态线程安全(我这样做)。
步骤2。如果在单个同步上下文上操作共享状态是绝对不可能的。
确保轮子没有被重新发明/肯定没有标准的替代方案可以适应这项工作。代码应该是非常内聚的,包含在一个单元中,例如,它很有可能是一些标准的线程安全数据结构的特殊情况,如哈希映射或集合或其他。
注意:如果代码很大/跨越多个类并且需要多线程状态操作,那么设计很有可能是不好的,请重新考虑第1步
步骤3。如果达到了这一步,那么我们需要测试我们自己的自定义有状态线程安全类/方法/单元。
我非常诚实:我从来没有为这样的代码编写过合适的测试。大多数情况下,我在第一步就成功了,有时在第二步。上次我不得不编写自定义线程安全代码是在很多年前,那是在我采用单元测试之前/可能我不需要用目前的知识来编写它。
如果我真的必须测试这样的代码(最终,真正的答案),那么我会尝试下面的一些事情
Non-deterministic stress testing. e.g. run 100 threads simultaneously and check that end result is consistent. This is more typical for higher level / integration testing of multiple users scenarios but also can be used at the unit level. Expose some test 'hooks' where test can inject some code to help make deterministic scenarios where one thread must perform operation before the other. As ugly as it is, I can't think of anything better. Delay-driven testing to make threads run and perform operations in particular order. Strictly speaking such tests are non-deterministic too (there's a chance of system freeze / stop-the-world GC collection which can distort otherwise orchestrated delays), also it is ugly but allows to avoid hooks.
其他回答
运行多个线程并不困难;这是小菜一碟。不幸的是,线程通常需要彼此通信;这就是困难所在。
最初发明的允许模块之间通信的机制是函数调用;当模块A想要与模块B通信时,它只调用模块B中的一个函数。不幸的是,这对线程不起作用,因为当你调用一个函数时,该函数仍然运行在当前线程中。
为了克服这个问题,人们决定采用一种更原始的通信机制:只声明一个特定的变量,并让两个线程都可以访问该变量。换句话说,允许线程共享数据。分享数据是人们自然而然想到的第一件事,这似乎是一个不错的选择,因为它看起来非常简单。我是说,能有多难,对吧?会出什么问题呢?
竞态条件。这就是可能、也将会出错的地方。
当人们意识到他们的软件由于竞争条件而遭受随机的、不可复制的灾难性失败时,他们开始发明复杂的机制,如锁和比较-交换,旨在防止此类事情的发生。这些机制属于广义的“同步”范畴。不幸的是,同步有两个问题:
这是很难做到的,所以很容易出现bug。 它是完全不可测试的,因为您无法测试竞态条件。
精明的读者可能会注意到“非常容易出现bug”和“完全不可测试”是一个致命的组合。
现在,在自动化软件测试的概念变得流行之前,我上面提到的机制已经被行业的大部分人发明和采用了;所以,没有人知道这个问题有多致命;他们只是认为这是一个很难的主题,需要高手程序员,每个人都能接受。
如今,无论我们做什么,我们都把测试放在第一位。所以,如果某些机制是不可测试的,那么使用该机制就是不可能的。因此,同步已经失宠;现在还在练的人已经很少了,而且练的人一天比一天少。
没有同步线程就不能共享数据;然而,最初的要求不是共享数据;它允许线程之间进行通信。除了共享数据之外,还存在其他更优雅的线程间通信机制。
其中一种机制是消息传递,也称为事件。
对于消息传递,整个软件系统中只有一个地方利用了同步,那就是我们用来存储消息的并发阻塞队列收集类。(我们的想法是,我们应该至少能把那一小部分做对。)
消息传递的优点是它不受竞态条件的影响,并且是完全可测试的。
近年来,在为几个项目编写线程处理代码时,我多次遇到过这个问题。我提供了一个迟来的答案,因为大多数其他答案虽然提供了替代方案,但实际上并没有回答关于测试的问题。我的答案是针对多线程代码没有替代方案的情况;为了完整性,我将讨论代码设计问题,但也将讨论单元测试。
编写可测试的多线程代码
首先要做的是将生产线程处理代码与所有执行实际数据处理的代码分开。这样,数据处理就可以作为单线程代码进行测试,多线程代码所做的唯一事情就是协调线程。
The second thing to remember is that bugs in multithreaded code are probabilistic; the bugs that manifest themselves least frequently are the bugs that will sneak through into production, will be difficult to reproduce even in production, and will thus cause the biggest problems. For this reason, the standard coding approach of writing the code quickly and then debugging it until it works is a bad idea for multithreaded code; it will result in code where the easy bugs are fixed and the dangerous bugs are still there.
相反,在编写多线程代码时,必须抱着一种从一开始就避免编写错误的态度来编写代码。如果您已经正确地删除了数据处理代码,线程处理代码应该足够小——最好只有几行,最坏也就几十行——这样您就有机会在不编写错误的情况下编写它,当然也不会编写很多错误,如果您了解线程,请慢慢来,并且小心。
为多线程代码编写单元测试
一旦尽可能仔细地编写了多线程代码,仍然值得为该代码编写测试。测试的主要目的与其说是测试高度依赖于时间的竞争条件错误(不可能重复测试这种竞争条件),不如说是测试防止这种错误的锁定策略是否允许多个线程按预期进行交互。
To properly test correct locking behavior, a test must start multiple threads. To make the test repeatable, we want the interactions between the threads to happen in a predictable order. We don't want to externally synchronize the threads in the test, because that will mask bugs that could happen in production where the threads are not externally synchronized. That leaves the use of timing delays for thread synchronization, which is the technique that I have used successfully whenever I've had to write tests of multithreaded code.
If the delays are too short, then the test becomes fragile, because minor timing differences - say between different machines on which the tests may be run - may cause the timing to be off and the test to fail. What I've typically done is start with delays that cause test failures, increase the delays so that the test passes reliably on my development machine, and then double the delays beyond that so the test has a good chance of passing on other machines. This does mean that the test will take a macroscopic amount of time, though in my experience, careful test design can limit that time to no more than a dozen seconds. Since you shouldn't have very many places requiring thread coordination code in your application, that should be acceptable for your test suite.
Finally, keep track of the number of bugs caught by your test. If your test has 80% code coverage, it can be expected to catch about 80% of your bugs. If your test is well designed but finds no bugs, there's a reasonable chance that you don't have additional bugs that will only show up in production. If the test catches one or two bugs, you might still get lucky. Beyond that, and you may want to consider a careful review of or even a complete rewrite of your thread handling code, since it is likely that code still contains hidden bugs that will be very difficult to find until the code is in production, and very difficult to fix then.
对于Java,请参阅JCIP的第12章。有一些具体的例子,可以编写确定性的多线程单元测试,以至少测试并发代码的正确性和不变量。
用单元测试“证明”线程安全要危险得多。我相信在各种平台/配置上进行自动化集成测试会更好。
For J2E code, I've used SilkPerformer, LoadRunner and JMeter for concurrency testing of threads. They all do the same thing. Basically, they give you a relatively simple interface for administrating their version of the proxy server, required, in order to analyze the TCP/IP data stream, and simulate multiple users making simultaneous requests to your app server. The proxy server can give you the ability to do things like analyze the requests made, by presenting the whole page and URL sent to the server, as well as the response from the server, after processing the request.
您可以在不安全的http模式下找到一些错误,在这种模式下,您至少可以分析正在发送的表单数据,并为每个用户系统地更改表单数据。但真正的测试是在https(安全套接字层)中运行。然后,您还必须有系统地修改会话和cookie数据,这可能有点复杂。
在测试并发性时,我发现的最好的错误是,当我发现开发人员在登录时依赖Java垃圾收集来关闭登录时建立的到LDAP服务器的连接请求。这导致用户暴露在其他用户的会话中,当试图分析服务器瘫痪时发生了什么,几乎每隔几秒钟就能完成一次事务时,结果非常令人困惑。
In the end, you or someone will probably have to buckle down and analyze the code for blunders like the one I just mentioned. And an open discussion across departments, like the one that occurred, when we unfolded the problem described above, are most useful. But these tools are the best solution to testing multi-threaded code. JMeter is open source. SilkPerformer and LoadRunner are proprietary. If you really want to know whether your app is thread safe, that's how the big boys do it. I've done this for very large companies professionally, so I'm not guessing. I'm speaking from personal experience.
提醒一句:理解这些工具确实需要一些时间。这不是简单地安装软件并启动GUI的问题,除非您已经接触过多线程编程。我试图确定需要理解的3个关键领域(表单、会话和cookie数据),希望至少从理解这些主题开始,可以帮助您集中精力快速获得结果,而不必通读整个文档。
确实很难!在我的(c++)单元测试中,我按照使用的并发模式将其分解为几个类别:
Unit tests for classes that operate in a single thread and aren't thread aware -- easy, test as usual. Unit tests for Monitor objects (those that execute synchronized methods in the callers' thread of control) that expose a synchronized public API -- instantiate multiple mock threads that exercise the API. Construct scenarios that exercise internal conditions of the passive object. Include one longer running test that basically beats the heck out of it from multiple threads for a long period of time. This is unscientific I know but it does build confidence. Unit tests for Active objects (those that encapsulate their own thread or threads of control) -- similar to #2 above with variations depending on the class design. Public API may be blocking or non-blocking, callers may obtain futures, data may arrive at queues or need to be dequeued. There are many combinations possible here; white box away. Still requires multiple mock threads to make calls to the object under test.
题外话:
在我所做的内部开发人员培训中,我教授了并发的支柱和这两种模式,作为思考和分解并发问题的主要框架。显然还有更先进的概念,但我发现这组基础知识可以帮助工程师摆脱困境。正如上面所描述的,它还会导致代码更具单元可测试性。