wip-3.1

2026-06-05 ⏳5.7分钟(2.3千字)

3.1 How to structure a unit test

3.1 如何组织一个单元测试

This section shows how to structure unit tests using the arrange, act, and assert pattern, what pitfalls to avoid, and how to make your tests as readable as possible.

本节展示如何使用准备、执行、断言模式组织单元测试,需要避免哪些陷阱,以及如何让测试尽可能可读。

3.1.1 Using the AAA pattern

3.1.1 使用 AAA 模式

The AAA pattern advocates for splitting each test into three parts: arrange, act, and assert. This pattern is sometimes also called the 3A pattern. Let’s take a Calculator class with a single method that calculates a sum of two numbers:

AAA 模式主张把每个测试拆分成三个部分:准备、执行和断言。这个模式有时也被称为 3A 模式。我们来看一个 Calculator 类,它只有一个方法,用于计算两个数字之和:

public class Calculator
{
    public double Sum(double first, double second)
    {
        return first + second;
    }
}

The following listing shows a test that verifies the class’s behavior. This test follows the AAA pattern.

下面的清单展示了一个验证该类行为的测试。这个测试遵循 AAA 模式。

Listing 3.1

The AAA pattern provides a simple, uniform structure for all tests in the suite. This uniformity is one of the biggest advantages of this pattern: once you get used to it, you can easily read and understand any test. That, in turn, reduces maintenance costs for your entire test suite. The structure is as follows:

AAA 模式为测试套件中的所有测试提供了简单且统一的结构。这种统一性是该模式最大的优势之一:一旦你习惯了它,就可以轻松阅读并理解任何测试。这反过来会降低整个测试套件的维护成本。其结构如下:

Given-When-Then pattern

Given-When-Then 模式

You might have heard of the Given-When-Then pattern, which is similar to AAA. This pattern also advocates for breaking the test down into three parts:

你可能听说过 Given-When-Then 模式,它与 AAA 类似。这个模式同样主张把测试拆分成三个部分:

There’s no difference between the two patterns in terms of the test composition. The only distinction is that the Given-When-Then structure is more readable to non-programmers. Thus, Given-When-Then is more suitable for tests that are shared with non-technical people.

在测试组成方面,这两种模式没有区别。唯一差异是 Given-When-Then 结构对非程序员更可读。因此,Given-When-Then 更适合那些需要与非技术人员共享的测试。

The natural inclination is to start writing a test with the arrange section. After all, it comes before the other two. This approach works well in the vast majority of cases, but starting with the assert section is a viable option too. When you practice Test-Driven Development (TDD)—that is, when you create a failing test before developing a feature—you don’t know enough about the feature’s behavior yet. So, it becomes advantageous to first outline what you expect from the behavior and then figure out how to develop the system to meet this expectation.

自然倾向是从准备阶段开始编写测试。毕竟,它位于另外两个阶段之前。这种方法在绝大多数情况下都很好,但从断言阶段开始也是可行选择。当你实践测试驱动开发(TDD)时,也就是在开发功能之前先创建一个失败测试时,你对功能行为还了解不够。因此,先勾勒出你对行为的期望,再弄清楚如何开发系统来满足这个期望,就会更有优势。

Such a technique may look counterintuitive, but it’s how we approach problem solving. We start by thinking about the objective: what a particular behavior should do for us. The actual solving of the problem comes after that. Writing down the assertions before everything else is merely a formalization of this thinking process. But again, this guideline is only applicable when you follow TDD—when you write a test before the production code. If you write the production code before the test, by the time you move on to the test, you already know what to expect from the behavior, so starting with the arrange section is a better option.

这种技术可能看起来违反直觉,但它正是我们解决问题的方式。我们先思考目标:某个特定行为应该为我们做什么。真正解决问题发生在那之后。在其他所有内容之前写下断言,只是把这个思考过程形式化。不过再次强调,这条指南只适用于你遵循 TDD 的情况,也就是在生产代码之前写测试。如果你先写生产代码再写测试,那么当你开始写测试时,已经知道应该期待什么行为,因此从准备阶段开始是更好的选择。

3.1.2 Avoid multiple arrange, act, and assert sections

3.1.2 避免多个准备、执行和断言阶段

Occasionally, you may encounter a test with multiple arrange, act, or assert sections. It usually works as shown in figure 3.1.

有时你可能会遇到包含多个准备、执行或断言阶段的测试。它通常如图 3.1 所示。

Figure 3.1

When you see multiple act sections separated by assert and, possibly, arrange sections, it means the test verifies multiple units of behavior. And, as we discussed in chapter 2, such a test is no longer a unit test but rather is an integration test. It’s best to avoid such a test structure.

当你看到多个执行阶段被断言阶段以及可能存在的准备阶段分隔开时,这意味着该测试验证了多个行为单元。正如第 2 章讨论的,这样的测试不再是单元测试,而是集成测试。最好避免这种测试结构。

A single action ensures that your tests remain within the realm of unit testing, which means they are simple, fast, and easy to understand. If you see a test containing a sequence of actions and assertions, refactor it. Extract each act into a test of its own.

单个动作可以确保你的测试仍然属于单元测试领域,这意味着它们简单、快速且易于理解。如果你看到一个测试包含一系列动作和断言,就重构它。把每个执行动作提取成独立测试。

It’s sometimes fine to have multiple act sections in integration tests. As you may remember from the previous chapter, integration tests can be slow. One way to speed them up is to group several integration tests together into a single test with multiple acts and assertions. It’s especially helpful when system states naturally flow from one another: that is, when an act simultaneously serves as an arrange for the subsequent act.

在集成测试中,有时包含多个执行阶段是可以的。你可能还记得上一章提到过,集成测试可能很慢。加速它们的一种方式,是把几个集成测试组合成一个包含多个执行和断言的测试。当系统状态自然地从一个流向另一个时,这尤其有帮助:也就是说,一个执行动作同时充当后续执行动作的准备阶段。

But again, this optimization technique is only applicable to integration tests—and not all of them, but rather those that are already slow and that you don’t want to become even slower. There’s no need for such an optimization in unit tests or integration tests that are fast enough. It’s always better to split a multistep unit test into several tests.

但再次强调,这种优化技术只适用于集成测试——而且不是所有集成测试,只适用于那些已经很慢、你不希望它们变得更慢的测试。对于单元测试或已经足够快的集成测试,不需要这种优化。把多步骤单元测试拆成多个测试总是更好。

3.1.3 Avoid if statements in tests

3.1.3 避免在测试中使用 if 语句

Similar to multiple occurrences of the arrange, act, and assert sections, you may sometimes encounter a unit test with an if statement. This is also an anti-pattern. A test—whether a unit test or an integration test—should be a simple sequence of steps with no branching.

类似于多个准备、执行和断言阶段的情况,你有时也可能遇到带有 if 语句的单元测试。这同样是一种反模式。一个测试——无论是单元测试还是集成测试——都应该是一串没有分支的简单步骤。

An if statement indicates that the test verifies too many things at once. Such a test, therefore, should be split into several tests. But unlike the situation with multiple AAA sections, there’s no exception for integration tests. There are no benefits in branching within a test. You only gain additional maintenance costs: if statements make the tests harder to read and understand.

if 语句表示该测试一次验证了太多东西。因此,这样的测试应该拆分成多个测试。但与多个 AAA 阶段的情况不同,这里对集成测试也没有例外。在测试内部引入分支没有任何好处。你只会获得额外维护成本:if 语句会让测试更难阅读和理解。

3.1.4 How large should each section be?

3.1.4 每个阶段应该多大?

A common question people ask when starting out with the AAA pattern is, how large should each section be? And what about the teardown section—the section that cleans up after the test? There are different guidelines regarding the size for each of the test sections.

刚开始使用 AAA 模式时,人们常问的一个问题是:每个阶段应该多大?还有 teardown 阶段,也就是测试后清理阶段,又该怎么办?关于每个测试阶段的大小,有不同指南。

THE ARRANGE SECTION IS THE LARGEST

准备阶段通常最大

The arrange section is usually the largest of the three. It can be as large as the act and assert sections combined. But if it becomes significantly larger than that, it’s better to extract the arrangements either into private methods within the same test class or to a separate factory class. Two popular patterns can help you reuse the code in the arrange sections: Object Mother and Test Data Builder.

准备阶段通常是三个阶段中最大的。它可以和执行阶段、断言阶段加起来一样大。但如果它明显超过这个大小,最好把准备逻辑提取到同一测试类中的私有方法,或提取到单独的工厂类中。有两个流行模式可以帮助你复用准备阶段代码:Object Mother 和 Test Data Builder。

WATCH OUT FOR ACT SECTIONS THAT ARE LARGER THAN A SINGLE LINE

注意超过一行的执行阶段

The act section is normally just a single line of code. If the act consists of two or more lines, it could indicate a problem with the SUT’s public API.

执行阶段通常只有一行代码。如果执行阶段包含两行或更多行,可能表示 SUT 的公共 API 存在问题。

It’s best to express this point with an example, so let’s take one from chapter 2, which I repeat in the following listing. In this example, the customer makes a purchase from a store.

最好用一个例子来说明这一点,所以我们取第 2 章中的一个例子,并在下面清单中重复它。在这个例子中,客户从商店购买商品。

Listing 3.2

Notice that the act section in this test is a single method call, which is a sign of a well-designed class’s API. Now compare it to the version in listing 3.3: this act section contains two lines. And that’s a sign of a problem with the SUT: it requires the client to remember to make the second method call to finish the purchase and thus lacks encapsulation.

注意,这个测试中的执行阶段是单个方法调用,这是类 API 设计良好的标志。现在把它与清单 3.3 中的版本比较:这个执行阶段包含两行。这说明 SUT 存在问题:它要求客户端记住还要调用第二个方法才能完成购买,因此缺乏封装。

Listing 3.3

Here’s what you can read from listing 3.3’s act section:

从清单 3.3 的执行阶段,你可以读出以下内容:

The issue with the new version is that it requires two method calls to perform a single operation. Note that this is not an issue with the test itself. The test still verifies the same unit of behavior: the process of making a purchase. The issue lies in the API surface of the Customer class. It shouldn’t require the client to make an additional method call.

新版本的问题在于,它需要两个方法调用才能完成一个操作。注意,这不是测试本身的问题。测试仍然验证同一个行为单元:完成购买的过程。问题出在 Customer 类的 API 表面上。它不应该要求客户端额外发起一个方法调用。

From a business perspective, a successful purchase has two outcomes: the acquisition of a product by the customer and the reduction of the inventory in the store. Both of these outcomes must be achieved together, which means there should be a single public method that does both things. Otherwise, there’s a room for inconsistency if the client code calls the first method but not the second, in which case the customer will acquire the product but its available amount won’t be reduced in the store.

从业务角度看,一次成功购买有两个结果:客户获得商品,以及商店中的库存减少。这两个结果必须一起达成,这意味着应该有一个公共方法同时完成这两件事。否则,如果客户端代码调用了第一个方法却没有调用第二个方法,就会出现不一致:客户获得了商品,但商店中的可用数量没有减少。

Such an inconsistency is called an invariant violation. The act of protecting your code against potential inconsistencies is called encapsulation. When an inconsistency penetrates into the database, it becomes a big problem: now it’s impossible to reset the state of your application by simply restarting it. You’ll have to deal with the corrupted data in the database and, potentially, contact customers and handle the situation on a case-by-case basis.

这种不一致被称为不变量破坏。保护代码免受潜在不一致影响的行为称为封装。当不一致渗透到数据库中时,它就会成为大问题:现在你无法通过简单重启应用来重置应用状态。你必须处理数据库中的损坏数据,并且可能还要联系客户,逐个案例处理情况。

Just imagine what would happen if the application generated confirmation receipts without actually reserving the inventory. It might issue claims to, and even charge for, more inventory than you could feasibly acquire in the near future.

想象一下,如果应用在没有真正预留库存的情况下生成了确认收据,会发生什么。它可能会承诺,甚至收费销售,比你近期实际能够获取的库存更多的商品。

The remedy is to maintain code encapsulation at all times. In the previous example, the customer should remove the acquired inventory from the store as part of its Purchase method and not rely on the client code to do so. When it comes to maintaining invariants, you should eliminate any potential course of action that could lead to an invariant violation.

补救方式是始终维护代码封装。在前面的例子中,客户应该在自己的 Purchase 方法中从商店移除已购买库存,而不是依赖客户端代码来做这件事。在维护不变量时,你应该消除任何可能导致不变量破坏的行动路径。

This guideline of keeping the act section down to a single line holds true for the vast majority of code that contains business logic, but less so for utility or infrastructure code. Thus, I won’t say “never do it.” Be sure to examine each such case for a potential breach in encapsulation, though.

将执行阶段保持为一行这条指南,适用于绝大多数包含业务逻辑的代码,但对工具代码或基础设施代码不那么适用。因此我不会说“永远不要这么做”。不过,一定要检查每个这类案例是否存在潜在封装破坏。

3.1.5 How many assertions should the assert section hold?

3.1.5 断言阶段应该包含多少断言?

Finally, there’s the assert section. You may have heard about the guideline of having one assertion per test. It takes root in the premise discussed in the previous chapter: the premise of targeting the smallest piece of code possible.

最后是断言阶段。你可能听说过每个测试只放一个断言的指南。它源自上一章讨论过的前提:尽可能瞄准最小代码片段。

As you already know, this premise is incorrect. A unit in unit testing is a unit of behavior, not a unit of code. A single unit of behavior can exhibit multiple outcomes, and it’s fine to evaluate them all in one test.

正如你已经知道的,这个前提是不正确的。单元测试中的单元是行为单元,而不是代码单元。一个行为单元可以表现出多个结果,在一个测试中评估所有这些结果是可以的。

Having that said, you need to watch out for assertion sections that grow too large: it could be a sign of a missing abstraction in the production code. For example, instead of asserting all properties inside an object returned by the SUT, it may be better to define proper equality members in the object’s class. You can then compare the object to an expected value using a single assertion.

话虽如此,你仍然需要注意过大的断言阶段:这可能表示生产代码中缺少某种抽象。例如,与其断言 SUT 返回对象中的所有属性,不如在该对象类中定义适当的相等性成员。这样你就可以用一个断言把该对象与期望值进行比较。

3.1.6 What about the teardown phase?

3.1.6 teardown 阶段怎么办?

Some people also distinguish a fourth section, teardown, which comes after arrange, act, and assert. For example, you can use this section to remove any files created by the test, close a database connection, and so on. The teardown is usually represented by a separate method, which is reused across all tests in the class. Thus, I don’t include this phase in the AAA pattern.

有些人还会区分第四个阶段:teardown,它位于准备、执行和断言之后。例如,你可以用这个阶段删除测试创建的文件、关闭数据库连接等等。teardown 通常由一个单独方法表示,并在类中的所有测试之间复用。因此,我不把这个阶段包含在 AAA 模式中。

Note that most unit tests don’t need teardown. Unit tests don’t talk to out-of-process dependencies and thus don’t leave side effects that need to be disposed of. That’s a realm of integration testing. We’ll talk more about how to properly clean up after integration tests in part 3.

注意,大多数单元测试不需要 teardown。单元测试不会与进程外依赖通信,因此不会留下需要清理的副作用。这属于集成测试的领域。我们会在第 3 部分进一步讨论如何在集成测试后正确清理。

3.1.7 Differentiating the system under test

3.1.7 区分被测系统

The SUT plays a significant role in tests. It provides an entry point for the behavior you want to invoke in the application. As we discussed in the previous chapter, this behavior can span across as many as several classes or as little as a single method. But there can be only one entry point: one class that triggers that behavior.

SUT 在测试中扮演重要角色。它为你想在应用中调用的行为提供入口点。正如上一章讨论的,这个行为可以跨越多个类,也可以小到只有一个方法。但入口点只能有一个:一个触发该行为的类。

Thus it’s important to differentiate the SUT from its dependencies, especially when there are quite a few of them, so that you don’t need to spend too much time figuring out who is who in the test. To do that, always name the SUT in tests sut. The following listing shows how CalculatorTests would look after renaming the Calculator instance.

因此,把 SUT 与其依赖区分开很重要,尤其是当依赖较多时,这样你就不需要花太多时间弄清楚测试中谁是谁。为此,在测试中始终把 SUT 命名为 sut。下面清单展示了把 Calculator 实例重命名之后,CalculatorTests 会是什么样子。

Listing 3.4

3.1.8 Dropping the arrange, act, and assert comments from tests

3.1.8 从测试中去掉 Arrange、Act 和 Assert 注释

Just as it’s important to set the SUT apart from its dependencies, it’s also important to differentiate the three sections from each other, so that you don’t spend too much time figuring out what section a particular line in the test belongs to. One way to do that is to put // Arrange, // Act, and // Assert comments before the beginning of each section. Another way is to separate the sections with empty lines, as shown next.

正如把 SUT 与依赖区分开很重要一样,把三个阶段彼此区分开也很重要,这样你就不需要花太多时间判断测试中的某一行属于哪个阶段。一种方式是在每个阶段开始前放置 // Arrange// Act// Assert 注释。另一种方式是用空行分隔这些阶段,如下所示。

Listing 3.5

This test still follows the AAA structure, but the comments are no longer needed because the sections are clearly separated by empty lines.

这个测试仍然遵循 AAA 结构,但由于各阶段已经由空行清晰分隔,因此不再需要这些注释。

Removing comments reduces noise and improves readability, but it works only when you don’t need additional empty lines inside the arrange or assert sections. In large tests, especially integration tests with complicated setup logic, comments may still be helpful.

移除注释可以减少噪音并提升可读性,但只有当你不需要在准备阶段或断言阶段内部添加额外空行时,这种方式才有效。在大型测试中,尤其是在包含复杂准备逻辑的集成测试中,注释仍然可能有帮助。

Therefore:

因此: