unit-testing-4.2-前两个属性之间的内在联系
zero4.2 The intrinsic connection between the first two attributes
4.2 前两个属性之间的内在联系
仅个人学习使用,支持正版。
书名:Unit Testing: Principles, Practices, and Patterns
As I mentioned earlier, there’s an intrinsic connection between the first two pillars of a good unit test—protection against regressions and resistance to refactoring. They both contribute to the accuracy of the test suite, though from opposite perspectives. These two attributes also tend to influence the project differently over time: while it’s important to have good protection against regressions very soon after the project’s initiation, the need for resistance to refactoring is not immediate.
正如前面提到的,优秀单元测试的前两个支柱——防止回归和抵抗重构——之间存在内在联系。它们都会提升测试套件的准确性,只是角度相反。这两个属性也会随着时间推移以不同方式影响项目:项目一开始后不久,就需要良好的回归防护;但对抵抗重构的需求并不是立即出现的。
In this section, I talk about:
本节会讨论:
- Maximizing test accuracy
最大化测试准确性。 - The importance of false positives and false negatives
假阳性与假阴性的重要性。
4.2.1 Maximizing test accuracy
4.2.1 最大化测试准确性
Let’s step back for a second and look at the broader picture with regard to test results. When it comes to code correctness and test results, there are four possible outcomes, as shown in figure 4.3. The test can either pass or fail (the rows of the table). And the functionality itself can be either correct or broken (the table’s columns).
我们先退一步,从更大的视角看看测试结果。就代码正确性和测试结果而言,有四种可能结果,如图 4.3 所示。测试可以通过或失败(表格的行),功能本身也可以是正确的或损坏的(表格的列)。
The situation when the test passes and the underlying functionality works as intended is a correct inference: the test correctly inferred the state of the system (there are no bugs in it). Another term for this combination of working functionality and a passing test is true negative.
当测试通过,并且底层功能也按预期工作时,这是一种正确推断:测试正确推断出了系统状态,也就是系统中没有缺陷。功能正常且测试通过的这种组合,也称为真阴性。
Similarly, when the functionality is broken and the test fails, it’s also a correct inference. That’s because you expect to see the test fail when the functionality is not working properly. That’s the whole point of unit testing. The corresponding term for this situation is true positive.
类似地,当功能损坏并且测试失败时,这也是一种正确推断。因为当功能不能正常工作时,你本来就期望看到测试失败。这正是单元测试的意义所在。对应的术语是真阳性。
But when the test doesn’t catch an error, that’s a problem. This is the upper-right quadrant, a false negative. And this is what the first attribute of a good test—protection against regressions—helps you avoid. Tests with a good protection against regressions help you to minimize the number of false negatives—type II errors.
但如果测试没有捕获到错误,那就是问题了。这对应右上象限,也就是假阴性。优秀测试的第一个属性——防止回归——就是帮助你避免这种情况。具备良好回归防护的测试,可以帮助你尽量减少假阴性,也就是二类错误。
On the other hand, there’s a symmetric situation when the functionality is correct but the test still shows a failure. This is a false positive, a false alarm. And this is what the second attribute—resistance to refactoring—helps you with.
另一方面,也存在一种对称情况:功能是正确的,但测试仍然显示失败。这就是假阳性,也就是误报。这正是第二个属性——抵抗重构——要帮助你解决的问题。
All these terms (false positive, type I error and so on) have roots in statistics, but can also be applied to analyzing a test suite. The best way to wrap your head around them is to think of a flu test. A flu test is positive when the person taking the test has the flu. The term positive is a bit confusing because there’s nothing positive about having the flu. But the test doesn’t evaluate the situation as a whole. In the context of testing, positive means that some set of conditions is now true. Those are the conditions the creators of the test have set it to react to. In this particular example, it’s the presence of the flu. Conversely, the lack of flu renders the flu test negative.
这些术语(假阳性、一类错误等)源自统计学,但也可以用来分析测试套件。理解它们最好的方式,是想象一次流感检测。当被检测者确实患有流感时,流感检测结果就是阳性。positive 这个词有点容易让人困惑,因为得流感并没有什么“正面”的含义。但测试并不是在评价整个情况的好坏。在测试语境中,阳性表示某组条件成立了,也就是测试设计者希望测试响应的条件成立。在这个例子里,就是“存在流感”。相反,没有流感时,流感检测就是阴性。
Now, when you evaluate how accurate the flu test is, you bring up terms such as false positive or false negative. The probability of false positives and false negatives tells you how good the flu test is: the lower that probability, the more accurate the test.
现在,当你评估流感检测有多准确时,就会用到假阳性、假阴性这样的术语。假阳性和假阴性出现的概率,说明这个流感检测有多好:概率越低,测试越准确。
This accuracy is what the first two pillars of a good unit test are all about. Protection against regressions and resistance to refactoring aim at maximizing the accuracy of the test suite. The accuracy metric itself consists of two components:
这种准确性正是优秀单元测试前两个支柱所关注的内容。防止回归和抵抗重构,目标都是最大化测试套件的准确性。准确性这个指标本身由两个部分组成:
- How good the test is at indicating the presence of bugs (lack of false negatives, the sphere of protection against regressions)
测试在指出缺陷存在方面表现如何,也就是是否缺少假阴性;这是防止回归的范畴。 - How good the test is at indicating the absence of bugs (lack of false positives, the sphere of resistance to refactoring)
测试在指出缺陷不存在方面表现如何,也就是是否缺少假阳性;这是抵抗重构的范畴。
Another way to think of false positives and false negatives is in terms of signal-to-noise ratio. As you can see from the formula in figure 4.4, there are two ways to improve test accuracy. The first is to increase the numerator, signal: that is, make the test better at finding regressions. The second is to reduce the denominator, noise: make the test better at not raising false alarms.
理解假阳性和假阴性的另一种方式,是从信噪比角度来看。如图 4.4 中的公式所示,提升测试准确性有两种方式。第一种是增加分子,也就是信号:让测试更擅长发现回归问题。第二种是降低分母,也就是噪声:让测试更不容易发出误报。
Both are critically important. There’s no use for a test that isn’t capable of finding any bugs, even if it doesn’t raise false alarms. Similarly, the test’s accuracy goes to zero when it generates a lot of noise, even if it’s capable of finding all the bugs in code. These findings are simply lost in the sea of irrelevant information.
两者都极其重要。一个不能发现任何缺陷的测试,即使从不误报,也没有用。同样,如果一个测试产生大量噪声,即使它能发现代码中的所有缺陷,它的准确性也会变成零。这些发现会淹没在无关信息的海洋里。
4.2.2 The importance of false positives and false negatives: The dynamics
4.2.2 假阳性与假阴性的重要性:动态变化
In the short term, false positives are not as bad as false negatives. In the beginning of a project, receiving a wrong warning is not that big a deal as opposed to not being warned at all and running the risk of a bug slipping into production. But as the project grows, false positives start to have an increasingly large effect on the test suite.
短期来看,假阳性并不像假阴性那么糟糕。在项目早期,收到一次错误警告并不是什么大问题;相比之下,完全没有收到警告、让缺陷溜进生产环境,风险更大。但随着项目增长,假阳性开始对测试套件产生越来越大的影响。
Why are false positives not as important initially? Because the importance of refactoring is also not immediate; it increases gradually over time. You don’t need to conduct many code clean-ups in the beginning of the project. Newly written code is often shiny and flawless. It’s also still fresh in your memory, so you can easily refactor it even if tests raise false alarms.
为什么假阳性一开始没有那么重要?因为重构的重要性也不是立刻出现的;它会随着时间逐渐增加。在项目早期,你并不需要进行太多代码清理。刚写出来的代码通常看起来崭新而完美,而且还鲜活地留在你的记忆中,所以即使测试发出误报,你也能轻松重构它。
But as time goes on, the code base deteriorates. It becomes increasingly complex and disorganized. Thus you have to start conducting regular refactorings in order to mitigate this tendency. Otherwise, the cost of introducing new features eventually becomes prohibitive.
但随着时间推移,代码库会逐渐退化,变得越来越复杂、越来越混乱。因此,你必须开始定期重构,以缓解这种趋势。否则,引入新功能的成本最终会高到难以承受。
As the need for refactoring increases, the importance of resistance to refactoring in tests increases with it. As I explained earlier, you can’t refactor when the tests keep crying “wolf” and you keep getting warnings about bugs that don’t exist. You quickly lose trust in such tests and stop viewing them as a reliable source of feedback.
随着重构需求增加,测试中抵抗重构的重要性也随之增加。正如前面解释的,如果测试总是“狼来了”式地报警,你不断收到不存在缺陷的警告,就无法重构。你会很快失去对这类测试的信任,不再把它们视为可靠的反馈来源。
Despite the importance of protecting your code against false positives, especially in the later project stages, few developers perceive false positives this way. Most people tend to focus solely on improving the first attribute of a good unit test—protection against regressions, which is not enough to build a valuable, highly accurate test suite that helps sustain project growth.
尽管防止假阳性非常重要,尤其是在项目后期,但很少有开发者会这样看待假阳性。大多数人倾向于只关注优秀单元测试的第一个属性——防止回归。然而,仅靠这一点并不足以构建一个有价值且高度准确、能够支撑项目持续增长的测试套件。
The reason, of course, is that far fewer projects get to those later stages, mostly because they are small and the development finishes before the project becomes too big. Thus developers face the problem of unnoticed bugs more often than false alarms that swarm the project and hinder all refactoring undertakings. And so, people optimize accordingly. Nevertheless, if you work on a medium to large project, you have to pay equal attention to both false negatives (unnoticed bugs) and false positives (false alarms).
原因当然是,能走到后期阶段的项目要少得多。大多数项目规模较小,在变得很大之前开发就结束了。因此,开发者更常面对的是未被发现的缺陷,而不是充斥项目、阻碍所有重构工作的误报。于是,人们也就相应地进行优化。尽管如此,如果你在中大型项目中工作,就必须同等关注假阴性(未被发现的缺陷)和假阳性(误报)。