wip-6.2

zero 2026-06-05 ⏳4.2分钟(1.7千字)

6.2 Comparing the three styles of unit testing

6.2 比较三种单元测试风格

There’s nothing new about output-based, state-based, and communication-based styles of unit testing. In fact, you already saw all of these styles previously in this book. What’s interesting is comparing them to each other using the four attributes of a good unit test. Here are those attributes again (refer to chapter 4 for more details):

基于输出、基于状态和基于通信这些单元测试风格并不新鲜。事实上，你在本书前面已经见过所有这些风格。真正有意思的是，用优秀单元测试的四个属性来比较它们。这里再次列出这些属性（更多细节请参考第 4 章）：

Protection against regressions
防止回归。
Resistance to refactoring
抗重构能力。
Fast feedback
快速反馈。
Maintainability
可维护性。

In our comparison, let’s look at each of the four separately.

在比较时，我们分别考察这四个属性。

6.2.1 Comparing the styles using the metrics of protection against regressions and feedback speed

6.2.1 使用防止回归和反馈速度指标比较这些风格

Let’s first compare the three styles in terms of the protection against regressions and feedback speed attributes, as these attributes are the most straightforward in this particular comparison. The metric of protection against regressions doesn’t depend on a particular style of testing. This metric is a product of the following three characteristics:

我们先从防止回归和反馈速度这两个属性比较三种风格，因为在这次比较中，这两个属性最直接。防止回归这一指标并不依赖某种特定测试风格。该指标由以下三个特征共同决定：

The amount of code that is executed during the test
测试执行期间运行的代码量。
The complexity of that code
这些代码的复杂度。
Its domain significance
这些代码的领域重要性。

Generally, you can write a test that exercises as much or as little code as you like; no particular style provides a benefit in this area. The same is true for the code’s complexity and domain significance. The only exception is the communication-based style: overusing it can result in shallow tests that verify only a thin slice of code and mock out everything else. Such shallowness is not a definitive feature of communication-based testing, though, but rather is an extreme case of abusing this technique.

一般来说，你可以编写一个测试，让它执行任意多或任意少的代码；没有哪种特定风格在这方面天然占优。代码复杂度和领域重要性也是如此。唯一的例外是基于通信的风格：过度使用它可能导致浅层测试，只验证很薄的一小片代码，并把其他所有东西都 mock 掉。不过，这种浅薄性并不是基于通信测试的必然特征，而是滥用该技术的极端情况。

There’s little correlation between the styles of testing and the test’s feedback speed. As long as your tests don’t touch out-of-process dependencies and thus stay in the realm of unit testing, all styles produce tests of roughly equal speed of execution. Communication-based testing can be slightly worse because mocks tend to introduce additional latency at runtime. But the difference is negligible, unless you have tens of thousands of such tests.

测试风格与测试反馈速度之间几乎没有相关性。只要测试不接触进程外依赖，从而仍然处在单元测试范围内，所有风格产生的测试执行速度大致相同。基于通信的测试可能略差一些，因为 mock 往往会在运行时引入额外延迟。但除非你有成千上万个这样的测试，否则差异可以忽略不计。

6.2.2 Comparing the styles using the metric of resistance to refactoring

6.2.2 使用抗重构能力指标比较这些风格

When it comes to the metric of resistance to refactoring, the situation is different. Resistance to refactoring is the measure of how many false positives (false alarms) tests generate during refactorings. False positives, in turn, are a result of tests coupling to code’s implementation details as opposed to observable behavior.

谈到抗重构能力指标时，情况就不同了。抗重构能力衡量的是测试在重构过程中会产生多少假阳性（误报）。而假阳性又是测试耦合到代码实现细节，而不是耦合到可观察行为的结果。

Output-based testing provides the best protection against false positives because the resulting tests couple only to the method under test. The only way for such tests to couple to implementation details is when the method under test is itself an implementation detail.

基于输出的测试对假阳性提供了最好的保护，因为这种测试只耦合到被测方法。只有当被测方法本身就是实现细节时，这类测试才会耦合到实现细节。

State-based testing is usually more prone to false positives. In addition to the method under test, such tests also work with the class’s state. Probabilistically speaking, the greater the coupling between the test and the production code, the greater the chance for this test to tie to a leaking implementation detail. State-based tests tie to a larger API surface, and hence the chances of coupling them to implementation details are also higher.

基于状态的测试通常更容易出现假阳性。除了被测方法之外，这类测试还会处理类的状态。从概率上说，测试与生产代码之间耦合越大，该测试绑定到泄漏出来的实现细节的机会就越大。基于状态的测试绑定到更大的 API 表面，因此它们耦合到实现细节的概率也更高。

Communication-based testing is the most vulnerable to false alarms. As you may remember from chapter 5, the vast majority of tests that check interactions with test doubles end up being brittle. This is always the case for interactions with stubs—you should never check such interactions. Mocks are fine only when they verify interactions that cross the application boundary and only when the side effects of those interactions are visible to the external world. As you can see, using communication-based testing requires extra prudence in order to maintain proper resistance to refactoring.

基于通信的测试最容易受到误报影响。你可能还记得第 5 章的内容：绝大多数检查测试替身交互的测试最终都会变得脆弱。对于与 stub 的交互，这一点总是成立——你永远不应该检查这类交互。只有当 mock 验证的是跨越应用边界的交互，并且这些交互产生的副作用对外部世界可见时，mock 才是合理的。可以看到，使用基于通信的测试需要额外谨慎，才能维持适当的抗重构能力。

But just like shallowness, brittleness is not a definitive feature of the communication-based style, either. You can reduce the number of false positives to a minimum by maintaining proper encapsulation and coupling tests to observable behavior only. Admittedly, though, the amount of due diligence varies depending on the style of unit testing.

不过，就像浅薄性一样，脆弱性也不是基于通信风格的必然特征。通过保持恰当封装，并且让测试只耦合到可观察行为，你可以把假阳性的数量降到最低。尽管如此，需要付出的谨慎程度确实会因单元测试风格而异。

6.2.3 Comparing the styles using the metric of maintainability

6.2.3 使用可维护性指标比较这些风格

Finally, the maintainability metric is highly correlated with the styles of unit testing; but, unlike with resistance to refactoring, there’s not much you can do to mitigate that. Maintainability evaluates the unit tests’ maintenance costs and is defined by the following two characteristics:

最后，可维护性指标与单元测试风格高度相关；不过，与抗重构能力不同，你很难对此进行缓解。可维护性评估的是单元测试的维护成本，并由以下两个特征定义：

How hard it is to understand the test, which is a function of the test’s size
理解测试有多困难，这取决于测试的大小。
How hard it is to run the test, which is a function of how many out-of-process dependencies the test works with directly
运行测试有多困难，这取决于测试直接处理多少进程外依赖。

Larger tests are less maintainable because they are harder to grasp or change when needed. Similarly, a test that directly works with one or several out-of-process dependencies (such as the database) is less maintainable because you need to spend time keeping those out-of-process dependencies operational: rebooting the database server, resolving network connectivity issues, and so on.

更大的测试可维护性更差，因为它们在需要时更难理解或修改。同样，直接处理一个或多个进程外依赖（例如数据库）的测试也更难维护，因为你需要花时间让这些进程外依赖保持可用：重启数据库服务器、解决网络连接问题，等等。

Maintainability of output-based tests

基于输出测试的可维护性

Compared with the other two types of testing, output-based testing is the most maintainable. The resulting tests are almost always short and concise and thus are easier to maintain. This benefit of the output-based style stems from the fact that this style boils down to only two things: supplying an input to a method and verifying its output, which you can often do with just a couple lines of code.

与另外两种测试相比，基于输出的测试最具可维护性。它产生的测试几乎总是短小而简洁，因此更容易维护。基于输出风格的这一优势源于这样一个事实：这种风格归根结底只做两件事——向方法提供输入，并验证它的输出，而这通常只需要几行代码。

Because the underlying code in output-based testing must not change the global or internal state, these tests don’t deal with out-of-process dependencies. Hence, output-based tests are best in terms of both maintainability characteristics.

由于基于输出测试背后的代码不能改变全局状态或内部状态，这些测试不会处理进程外依赖。因此，从两个可维护性特征来看，基于输出的测试都是最好的。

Maintainability of state-based tests

基于状态测试的可维护性

State-based tests are normally less maintainable than output-based ones. This is because state verification often takes up more space than output verification. Here’s another example of state-based testing.

基于状态的测试通常比基于输出的测试更难维护。这是因为状态验证往往比输出验证占用更多空间。下面是另一个基于状态测试的示例。

This test adds a comment to an article and then checks to see if the comment appears in the article’s list of comments. Although this test is simplified and contains just a single comment, its assertion part already spans four lines. State-based tests often need to verify much more data than that and, therefore, can grow in size significantly.

这个测试向文章添加一条评论，然后检查该评论是否出现在文章的评论列表中。虽然这个测试已经被简化，并且只包含一条评论，但它的断言部分已经占了四行。基于状态的测试经常需要验证比这多得多的数据，因此体积可能显著增长。

You can mitigate this issue by introducing helper methods that hide most of the code and thus shorten the test (see listing 6.5), but these methods require significant effort to write and maintain. This effort is justified only when those methods are going to be reused across multiple tests, which is rarely the case. I’ll explain more about helper methods in part 3 of this book.

你可以通过引入辅助方法来缓解这个问题，这些方法会隐藏大部分代码，从而缩短测试（见清单 6.5）；但编写和维护这些方法需要大量工作。只有当这些方法会在多个测试中复用时，这种投入才合理，而这种情况并不常见。我会在本书第 3 部分进一步解释辅助方法。

Another way to shorten a state-based test is to define equality members in the class that is being asserted. In listing 6.6, that’s the Comment class. You could turn it into a value object (a class whose instances are compared by value and not by reference), as shown next; this would also simplify the test, especially if you combined it with an assertion library like Fluent Assertions.

缩短基于状态测试的另一种方式，是在被断言的类中定义相等性成员。在清单 6.6 中，这个类是 Comment。你可以把它转换为值对象（其实例按值而不是按引用比较的类），如下所示；这也会简化测试，尤其是当你把它与 Fluent Assertions 这样的断言库结合使用时。

This test uses the fact that comments can be compared as whole values, without the need to assert individual properties in them. It also uses the BeEquivalentTo method from Fluent Assertions, which can compare entire collections, thereby removing the need to check the collection size.

这个测试利用了评论可以作为整体值进行比较这一事实，因此不需要逐个断言其中的属性。它还使用了 Fluent Assertions 的 BeEquivalentTo 方法，该方法可以比较整个集合，从而不再需要检查集合大小。

This is a powerful technique, but it works only when the class is inherently a value and can be converted into a value object. Otherwise, it leads to code pollution (polluting production code base with code whose sole purpose is to enable or, as in this case, simplify unit testing). We’ll discuss code pollution along with other unit testing anti-patterns in chapter 11.

这是一种强大的技术，但只有当这个类本质上就是一个值，并且可以转换为值对象时才有效。否则，它会导致代码污染（用唯一目的只是启用，或像本例这样简化单元测试的代码污染生产代码库）。我们会在第 11 章讨论代码污染以及其他单元测试反模式。

As you can see, these two techniques—using helper methods and converting classes into value objects—are applicable only occasionally. And even when these techniques are applicable, state-based tests still take up more space than output-based tests and thus remain less maintainable.

如你所见，使用辅助方法和将类转换为值对象这两种技术只是在少数情况下适用。即便这些技术适用，基于状态的测试仍然比基于输出的测试占用更多空间，因此依然更难维护。

Maintainability of communication-based tests

基于通信测试的可维护性

Communication-based tests score worse than output-based and state-based tests on the maintainability metric. Communication-based testing requires setting up test doubles and interaction assertions, and that takes up a lot of space. Tests become even larger and less maintainable when you have mock chains (mocks or stubs returning other mocks, which also return mocks, and so on, several layers deep).

在可维护性指标上，基于通信的测试比基于输出和基于状态的测试表现更差。基于通信的测试需要设置测试替身和交互断言，这会占用大量空间。当存在 mock 链（mock 或 stub 返回其他 mock，而这些 mock 又返回更多 mock，如此深入好几层）时，测试会变得更大，也更难维护。

6.2.4 Comparing the styles: The results

6.2.4 比较这些风格：结果

Let’s now compare the styles of unit testing using the attributes of a good unit test. Table 6.1 sums up the comparison results. As discussed in section 6.2.1, all three styles score equally with the metrics of protection against regressions and feedback speed; hence, I’m omitting these metrics from the comparison.

现在，让我们使用优秀单元测试的属性来比较这些单元测试风格。表 6.1 总结了比较结果。如 6.2.1 节所述，三种风格在防止回归和反馈速度指标上的得分相同，因此我会把这些指标从比较中省略。

Output-based testing shows the best results. This style produces tests that rarely couple to implementation details and thus don’t require much due diligence to maintain proper resistance to refactoring. Such tests are also the most maintainable due to their conciseness and lack of out-of-process dependencies.

基于输出的测试表现最好。这种风格产生的测试很少耦合到实现细节，因此不需要太多额外谨慎就能保持良好的抗重构能力。由于简洁且没有进程外依赖，这类测试也是最具可维护性的。

State-based and communication-based tests are worse on both metrics. These are more likely to couple to a leaking implementation detail, and they also incur higher maintenance costs due to being larger in size.

基于状态和基于通信的测试在这两个指标上都更差。它们更可能耦合到泄漏出来的实现细节，并且由于体积更大，也会产生更高的维护成本。

Always prefer output-based testing over everything else. Unfortunately, it’s easier said than done. This style of unit testing is only applicable to code that is written in a functional way, which is rarely the case for most object-oriented programming languages. Still, there are techniques you can use to transition more of your tests toward the output-based style.

始终优先选择基于输出的测试，而不是其他风格。遗憾的是，这说起来容易做起来难。这种单元测试风格只适用于以函数式方式编写的代码，而对于大多数面向对象编程语言来说，这种情况并不常见。尽管如此，你仍然可以使用一些技术，让更多测试向基于输出的风格转变。

The rest of this chapter shows how to transition from state-based and collaboration-based testing to output-based testing. The transition requires you to make your code more purely functional, which, in turn, enables the use of output-based tests instead of state- or communication-based ones.

本章剩余部分会展示如何从基于状态和基于协作的测试转向基于输出的测试。这种转变要求你让代码更接近纯函数式，而这又会使你能够使用基于输出的测试，而不是基于状态或基于通信的测试。