如果您强制要求单元测试的代码覆盖率的最低百分比,甚至可能作为提交到存储库的要求,它会是什么?

请解释你是如何得出你的答案的(因为如果你所做的只是选择一个数字,那么我自己也可以完成;)


85%是签入标准的一个很好的起点。

我可能会选择各种更高的发布标准——这取决于正在测试的子系统/组件的临界性。


这在很大程度上取决于您的应用程序。例如,一些应用程序主要由不能进行单元测试的GUI代码组成。


Alberto Savoia的这篇散文恰好回答了这个问题(以一种非常有趣的方式!):

http://www.artima.com/forums/flat.jsp?forum=106&thread=204677

Testivus On Test Coverage Early one morning, a programmer asked the great master: “I am ready to write some unit tests. What code coverage should I aim for?” The great master replied: “Don’t worry about coverage, just write some good tests.” The programmer smiled, bowed, and left. ... Later that day, a second programmer asked the same question. The great master pointed at a pot of boiling water and said: “How many grains of rice should I put in that pot?” The programmer, looking puzzled, replied: “How can I possibly tell you? It depends on how many people you need to feed, how hungry they are, what other food you are serving, how much rice you have available, and so on.” “Exactly,” said the great master. The second programmer smiled, bowed, and left. ... Toward the end of the day, a third programmer came and asked the same question about code coverage. “Eighty percent and no less!” Replied the master in a stern voice, pounding his fist on the table. The third programmer smiled, bowed, and left. ... After this last reply, a young apprentice approached the great master: “Great master, today I overheard you answer the same question about code coverage with three different answers. Why?” The great master stood up from his chair: “Come get some fresh tea with me and let’s talk about it.” After they filled their cups with smoking hot green tea, the great master began to answer: “The first programmer is new and just getting started with testing. Right now he has a lot of code and no tests. He has a long way to go; focusing on code coverage at this time would be depressing and quite useless. He’s better off just getting used to writing and running some tests. He can worry about coverage later.” “The second programmer, on the other hand, is quite experience both at programming and testing. When I replied by asking her how many grains of rice I should put in a pot, I helped her realize that the amount of testing necessary depends on a number of factors, and she knows those factors better than I do – it’s her code after all. There is no single, simple, answer, and she’s smart enough to handle the truth and work with that.” “I see,” said the young apprentice, “but if there is no single simple answer, then why did you answer the third programmer ‘Eighty percent and no less’?” The great master laughed so hard and loud that his belly, evidence that he drank more than just green tea, flopped up and down. “The third programmer wants only simple answers – even when there are no simple answers … and then does not follow them anyway.” The young apprentice and the grizzled great master finished drinking their tea in contemplative silence.


如果这是一个完美的世界,100%的代码将被单元测试覆盖。然而,因为这不是一个完美的世界,这是一个你有时间做什么的问题。因此,我建议少关注特定的百分比,而更多地关注关键领域。如果你的代码写得很好(或者至少是一个合理的副本),应该有几个关键点将api暴露给其他代码。

将您的测试工作集中在这些api上。确保api是1)良好的文档化的,2)已经编写了与文档相匹配的测试用例。如果预期的结果与文档不匹配,那么您的代码、文档或测试用例中都存在bug。所有这些都是值得调查的。

好运!


我认为不可能有这样的B/W规则。 应该审查代码,特别注意关键细节。 然而,如果它没有经过测试,它就有一个bug!


简单回答:60-80%

长一点的回答: 我认为这完全取决于你项目的性质。我通常通过单元测试每一个实际部分来开始一个项目。在项目的第一次“发布”时,你应该有一个相当不错的基于你所做的编程类型的基础百分比。在这一点上,您可以开始“强制”最小的代码覆盖率。


根据代码的关键程度,75%-85%是一个很好的经验法则。 运输代码肯定应该比房屋公用事业等更彻底地测试。


直到几天前,我们的目标是>的80%,但在我们使用了大量生成代码后,我们并不关心%age,而是让审核人员决定覆盖率要求。


这取决于您所处的应用程序开发生命周期的哪个阶段。

If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).

根据你的领域,达到95%并不是不合理的,但我不得不说,平均而言,你将看到的平均情况是85%到90%。


如果你的目标是100%的覆盖率(而不是100%测试所有功能),那么代码覆盖率就是一个误导的指标。

你可以通过一次命中所有的线来获得100%。然而,您仍然可能错过测试这些行命中的特定序列(逻辑路径)。 您不能得到100%,但仍然测试了所有80%/频率使用的代码路径。测试每个“抛出ExceptionTypeX”或类似的防御性编程保护是“有就好”而不是“必须”

所以要相信你自己或你的开发人员是彻底的,并覆盖他们代码中的每一条路径。要务实,不要追求神奇的100%覆盖率。如果你用TDD开发你的代码,你应该得到90%以上的覆盖率作为奖励。使用代码覆盖来突出你错过的代码块(如果你使用TDD就不应该发生这种情况。因为您编写代码只是为了通过测试。没有伙伴测试,任何代码都不能存在。)


我认为正确的代码覆盖率的最佳症状是单元测试帮助解决的具体问题的数量合理地对应于您创建的单元测试代码的大小。


如果你已经做了相当长一段时间的单元测试,我认为没有理由不接近95%以上。然而,至少,我总是使用80%的测试,即使是刚开始测试的时候。

这个数字应该只包括在项目中编写的代码(不包括框架、插件等),甚至可能排除完全由调用外部代码编写的代码组成的某些类。这种电话应该被嘲笑。


我想分享另一个关于测试报道的趣闻。

我们有一个巨大的项目,在twitter上,我注意到,700个单元测试,我们只有20%的代码覆盖率。

斯科特·汉塞尔曼的回答充满智慧:

这是正确的20%吗?是20%吗 代表您的用户的代码 打击最大?你可能会再加50个 测试后只添加2%

这又回到了我关于代码覆盖率的答案。你应该在锅里放多少米?视情况而定。


一般来说,从我读过的几篇工程卓越最佳实践论文来看,单元测试中80%的新代码是产生最佳回报的点。如果超过这个CC%,所付出的努力就会产生更少的缺陷。这是许多大公司所采用的最佳实践。

不幸的是,这些结果大多是公司内部的,所以我没有公开的文献可供参考。


看看Crap4j。这是一种比直接代码覆盖稍微复杂一些的方法。它结合了代码覆盖率度量和复杂性度量,然后向您展示哪些复杂的代码目前没有测试。


我对这个难题的回答是,对可以测试的代码有100%的行覆盖率,对不能测试的代码有0%的行覆盖率。

我目前在Python中的做法是将.py模块分为两个文件夹:app1/和app2/,当运行单元测试时,计算这两个文件夹的覆盖率,并直观地检查(有朝一日我必须自动化)app1的覆盖率为100%,而app2的覆盖率为0%。

当/如果我发现这些数字与标准不同,我会调查并改变代码的设计,使覆盖率符合标准。

这意味着我可以建议实现库代码的100%行覆盖率。

我也偶尔检查app2/,看看我是否可以在那里测试任何代码,如果我可以,我将它移动到app1/

现在我不太担心总覆盖率,因为这取决于项目的规模,但通常情况下我看到的是70%到90%以上。

使用python,我应该能够设计一个烟雾测试,可以自动运行我的应用程序,同时测量覆盖率,并有希望获得100%的烟雾测试与单元测试数字的聚合。


代码覆盖率是很好的,但前提是你从中得到的好处超过了实现它的成本/努力。

一段时间以来,我们一直在努力达到80%的标准,但我们刚刚决定放弃这个标准,转而更专注于我们的测试。专注于复杂的业务逻辑等,

这个决定是由于我们花在追逐代码覆盖率和维护现有单元测试上的时间越来越多。我们觉得我们已经到达了这样一个点:我们从代码覆盖率中得到的好处被认为比我们为实现它所付出的努力要少。


从另一个角度查看覆盖率:具有清晰控制流的编写良好的代码是最容易覆盖、最容易阅读的,并且通常是错误最少的代码。在编写代码时牢记清晰和可覆盖性,并在编写代码时并行编写单元测试,以我之见,您将得到最好的结果。


我认为最重要的是了解随着时间的推移,覆盖率的趋势是什么,并理解趋势变化的原因。你认为趋势的变化是好是坏取决于你对原因的分析。


代码覆盖率很好,但功能覆盖率更好。我不喜欢把我写的每一行都写一遍。但是我相信编写100%的测试覆盖我想要提供的所有功能(即使是我自己带来的额外的很酷的功能,这些功能在会议中没有讨论)。

我不关心我的代码是否会在测试中被覆盖,但我关心的是我是否会重构我的代码并最终具有不同的行为。因此,100%的功能覆盖率是我的唯一目标。


我使用cobertura,无论百分比是多少,我都建议保持cobertura检查任务中的值是最新的。至少,不断提高totallinerate和totalbranrate到刚好低于你当前的覆盖率,但永远不要降低这些值。还将Ant构建失败属性绑定到此任务。如果构建因为缺乏覆盖而失败,那么您知道有人添加了代码,但没有测试它。例子:

<cobertura-check linerate="0"
                 branchrate="0"
                 totallinerate="70"
                 totalbranchrate="90"
                 failureproperty="build.failed" />

当我认为我的代码没有经过足够的单元测试,并且我不确定接下来要测试什么时,我使用覆盖率来帮助我决定接下来要测试什么。

如果我在一个单元测试中增加覆盖率——我知道这个单元测试有价值。

这适用于未覆盖的代码,50%覆盖或97%覆盖。


在我看来,答案是“这取决于你有多少时间”。我试着达到100%,但如果我没有在我拥有的时间内完成它,我也不会大惊小怪。

当我编写单元测试时,我戴着与开发产品代码时不同的帽子。我考虑测试的代码声称要做什么,以及可能破坏它的情况是什么。

我通常遵循以下标准或规则:

单元测试应该是关于我的代码的预期行为的一种文档形式。给定特定输入的预期输出以及它可能抛出的客户端可能想要捕获的异常(我的代码的用户应该知道什么?) 单元测试应该帮助我发现我可能还没有想到的假设条件。(如何使我的代码稳定和健壮?)

如果这两条规则不能产生100%的覆盖率,那就顺其自然吧。但是一旦我有时间,我就会分析未覆盖的块和行,并确定是否仍然存在没有单元测试的测试用例,或者是否需要重构代码以消除不必要的代码。


从Testivus的帖子中,我认为答案上下文应该是第二个程序员。

从实际的角度来看,我们需要争取参数/目标。

我认为这可以在敏捷过程中进行“测试”,方法是分析我们拥有的代码、架构、功能(用户故事),然后得出一个数字。根据我在电信领域的经验,我认为60%是一个很好的值。


对于一个设计良好的系统,单元测试从一开始就驱动开发,我认为85%是一个相当低的数字。设计为可测试的小类应该不难更好地覆盖。

我们很容易用这样的话来回避这个问题:

覆盖的行不等于测试的逻辑,不应该对百分比进行过多的解读。

没错,但是关于代码覆盖有一些重要的地方需要注意。根据我的经验,如果使用得当,这个指标实际上非常有用。话虽如此,我并没有见过所有的系统,我敢肯定有很多系统很难看到代码覆盖率分析增加任何真正的价值。代码可能看起来很不一样,可用测试框架的范围也可能不同。

此外,我的推理主要涉及相当短的测试反馈循环。对于我正在开发的产品,最短的反馈循环非常灵活,涵盖了从类测试到进程间信号的所有内容。测试一个可交付的子产品通常需要5分钟,对于这样短的反馈循环,确实可以使用测试结果(特别是我们在这里看到的代码覆盖率指标)来拒绝或接受存储库中的提交。

当使用代码覆盖率度量时,您不应该只有一个必须实现的固定(任意)百分比。在我看来,这样做并不能给您带来代码覆盖率分析的真正好处。相反,定义以下指标:

低水位标记(LWM),在测试系统中所见过的最低裸露线数 高水位标记(HWM),在测试系统中所见过的最高代码覆盖率

只有在不超过LWM和不低于HWM的情况下,才能添加新代码。换句话说,不允许减少代码覆盖率,并且应该覆盖新代码。注意我如何说应该和不必须(下面解释)。

但这难道不意味着,你将不可能清理那些久经考验、不再有用的旧垃圾吗?是的,这就是为什么你在这些事情上必须务实。有些情况下必须打破规则,但根据我的经验,对于典型的日常集成来说,这些指标非常有用。他们给出了以下两个暗示。

Testable code is promoted. When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing. Test coverage for legacy code is increasing over time. When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.

同样,如果反馈循环太长,在集成过程中设置这样的东西可能是完全不切实际的。

我还想提到代码覆盖度量的另外两个一般好处。

Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse. People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.

为了完整起见,一个是否定的。

In a large project with many involved developers, everyone is not going to be a test-genius for sure. Some people tend to use the code coverage metric as proof that the code is tested and this is very far from the truth, as mentioned in many of the other answers to this question. It is ONE metric that can give you some nice benefits if used properly, but if it is misused it can in fact lead to bad testing. Aside from the very valuable side effects mentioned above a covered line only shows that the system under test can reach that line for some input data and that it can execute without hanging or crashing.


代码覆盖率只是另一个度量。就其本身而言,它可能非常具有误导性(参见www.thoughtworks.com/insights/blog/are-test-coverage-metrics-overrated)。因此,您的目标不应该是实现100%的代码覆盖率,而是要确保您测试了应用程序的所有相关场景。


我最喜欢的代码覆盖率是100%。星号之所以出现,是因为我更喜欢使用工具,允许我将某些行标记为“不算数”的行。如果我已经覆盖了100%“有用”的行,我就完成了。

底层流程是:

我编写测试来测试我能想到的所有功能和边缘情况(通常从文档开始工作)。 我运行代码覆盖工具 我检查了所有没有覆盖的线或路径,以及我认为不重要或无法到达的线或路径(由于防御性编程),我将其标记为不算数 我编写了新的测试来覆盖缺失的行,并在没有提到这些边缘情况时改进文档。

这样,如果我和我的合作者将来添加新代码或更改测试,就会有一条亮线告诉我们是否错过了重要的东西——覆盖率下降到100%以下。然而,它也提供了处理不同测试优先级的灵活性。


Jon Limjap提出了一个很好的观点——没有一个单一的数字可以作为每个项目的标准。有些项目根本不需要这样的标准。在我看来,公认的答案不足之处在于,它没有描述一个人如何为一个给定的项目做出决定。

我将尝试这样做。我不是测试工程方面的专家,很高兴看到一个更明智的答案。

何时设置代码覆盖率需求

First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.

代码覆盖率是一种客观的度量:一旦您看到覆盖率报告,对于是否满足标准是有用的就没有什么不明确的了。它能证明正确性吗?完全不是,但是它与代码测试的良好程度有明确的关系,这反过来是我们增加对其正确性信心的最佳方式。代码覆盖率是我们所关心的不可测量的质量的可测量近似值。

在某些具体情况下,经验标准可以增加价值:

To satisfy stakeholders. For many projects, there are various actors who have an interest in software quality who may not be involved in the day-to-day development of the software (managers, technical leads, etc.) Saying "we're going to write all the tests we really need" is not convincing: They either need to trust entirely, or verify with ongoing close oversight (assuming they even have the technical understanding to do so.) Providing measurable standards and explaining how they reasonably approximate actual goals is better. To normalize team behavior. Stakeholders aside, if you are working on a team where multiple people are writing code and tests, there is room for ambiguity for what qualifies as "well-tested." Do all of your colleagues have the same idea of what level of testing is good enough? Probably not. How do you reconcile this? Find a metric you can all agree on and accept it as a reasonable approximation. This is especially (but not exclusively) useful in large teams, where leads may not have direct oversight over junior developers, for instance. Networks of trust matter as well, but without objective measurements, it is easy for group behavior to become inconsistent, even if everyone is acting in good faith. To keep yourself honest. Even if you're the only developer and only stakeholder for your project, you might have certain qualities in mind for the software. Instead of making ongoing subjective assessments about how well-tested the software is (which takes work), you can use code coverage as a reasonable approximation, and let machines measure it for you.

使用哪些指标

代码覆盖率不是单一的度量;有几种不同的方法来衡量覆盖率。您可以根据哪一种标准来设置标准,这取决于您使用该标准来满足什么。

我将使用两个常见的指标作为例子,说明何时可以使用它们来设置标准:

Statement coverage: What percentage of statements have been executed during testing? Useful to get a sense of the physical coverage of your code: How much of the code that I have written have I actually tested? This kind of coverage supports a weaker correctness argument, but is also easier to achieve. If you're just using code coverage to ensure that things get tested (and not as an indicator of test quality beyond that) then statement coverage is probably sufficient. Branch coverage: When there is branching logic (e.g. an if), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested? This kind of coverage is a much better indicator that a program has been tested across a comprehensive set of inputs. If you're using code coverage as your best empirical approximation for confidence in correctness, you should set standards based on branch coverage or similar.

还有许多其他指标(例如,行覆盖率与语句覆盖率相似,但对于多行语句产生不同的数值结果;条件覆盖和路径覆盖类似于分支覆盖,但反映了您可能遇到的程序执行的可能排列的更详细的视图。)

需要多大的比例

最后,回到最初的问题:如果您设置了代码覆盖率标准,那么这个数字应该是多少?

希望大家已经很清楚了我们讨论的是一开始的近似值,所以我们选的任何数都是固有的近似值。

你可以选择一些数字:

100%. You might choose this because you want to be sure everything is tested. This doesn't give you any insight into test quality, but does tell you that some test of some quality has touched every statement (or branch, etc.) Again, this comes back to degree of confidence: If your coverage is below 100%, you know some subset of your code is untested. Some might argue that this is silly, and you should only test the parts of your code that are really important. I would argue that you should also only maintain the parts of your code that are really important. Code coverage can be improved by removing untested code, too. 99% (or 95%, other numbers in the high nineties.) Appropriate in cases where you want to convey a level of confidence similar to 100%, but leave yourself some margin to not worry about the occasional hard-to-test corner of code. 80%. I've seen this number in use a few times, and don't entirely know where it originates. I think it might be a weird misappropriation of the 80-20 rule; generally, the intent here is to show that most of your code is tested. (Yes, 51% would also be "most", but 80% is more reflective of what most people mean by most.) This is appropriate for middle-ground cases where "well-tested" is not a high priority (you don't want to waste effort on low-value tests), but is enough of a priority that you'd still like to have some standard in place.

在实践中,我从未见过低于80%的数字,也很难想象在什么情况下会设置这些数字。这些标准的作用是增强人们对正确性的信心,而低于80%的数字并不能特别鼓舞人们的信心。(是的,这是主观的,但同样,这个想法是在你设定标准时做出一次主观选择,然后再使用客观的测量方法。)

其他的笔记

以上假设正确性是目标。代码覆盖率只是信息;它可能与其他目标相关。例如,如果您关心可维护性,那么您可能会关心松耦合,松耦合可以通过可测试性来证明,而可测试性又可以(以某种方式)通过代码覆盖率来度量。因此,代码覆盖率标准也为近似“可维护性”的质量提供了经验基础。


我更喜欢做BDD,它使用自动化验收测试、可能还有其他集成测试和单元测试的组合。对我来说,问题是自动化测试套件作为一个整体的目标覆盖率应该是多少。

That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.

答案也取决于项目的历史。我发现上述方法只适用于从一开始就以这种方式管理的项目。我已经极大地改进了大型遗留项目的覆盖率,这样做是值得的,但是我从来没有发现回过头去填补每个覆盖率空白是可行的,因为旧的未经测试的代码不能很好地理解,不能正确和快速地完成这些工作。


许多商店不看重测试的价值,所以如果你高于零,至少有一些价值的升值——所以可以说非零并不是坏事,因为许多仍然是零。

在。net世界中,人们经常引用80%作为合理的。但题目说的是溶液水平。我更喜欢在项目级别进行度量:如果有Selenium等或手动测试,那么UI项目的30%可能就可以了,数据层项目的20%可能就可以了,但是对于业务规则层(如果不是完全必要的话),95%以上可能是可以实现的。因此,总体覆盖率可能是60%,但关键业务逻辑可能更高。

我也听过这样的话:追求100%,你就能达到80%;但是,立志达到80%,你就会达到40%。

底线:应用80:20规则,让应用程序的bug计数来指导你。