4 All Pairs Testing - essenius/AcceptanceTesting GitHub Wiki
Background
When you have an increasing number of configurations to test, you will get hit by the combinatorial explosion: there are so many possible test cases that it becomes infeasible to execute them all. For example, you may need to test your web application on 8 client operating systems, 6 browser versions, 4 server operating systems and 3 web server versions. Testing all possible combinations would lead to 8 * 6 * 3 * 4 = 576 different configurations.
Most defects tend to show with a specific value of a single input variable or with an interaction between a specific pair of values. A study by NIST reported (page 5):
Interaction Rule: Most failures are induced by single factor faults or by the joint combinatorial effect (interaction) of two factors, with progressively fewer failures induced by interactions between three or more factors.
This means that it is likely that most defects will be caught if you can make sure to cover all pairs of these 'factors'. For example, when testing a Web application, a defect may only show up in Windows 10, or it may only show up in Windows 11 with Google Chrome. It is much less likely that a defect will show up in no other configuration than Windows 11 with Google Chrome and Apache.
Therefore, if you ensure that you have covered all possible pairs of parameters in your test cases at least once, you will have a high defect identification capability with substantially less test cases than would be required for all possible combinations. This is the idea behind All Pairs Testing (a.k.a. Pairwise Testing).
Example
We will use a small example to show the principle without requiring too large tables of test cases. Assume we have the following set of configurations we want to test:
- 3 client operating systems (OS): Windows, OSX and Linux
- 4 browsers: Firefox, Chrome, Safari and Edge
- 2 server OSes: Windows and Linux
- 2 web servers: Apache and Lighttpd
To achieve full coverage, we would need to create 3 x 4 x 2 x 2 = 48 configurations. Instead, we design a set of configurations that contains all pairs of Client OS: Browser, Client OS: Server OS, Client OS: Web Server, Browser: Server OS, Browser: Web Server, and Server OS: Web Server. For example, All Pairs of Server OS: Web Server would be Windows: Apache, Windows: Lighttpd, Linux: Apache and Linux: Lighttpd. The trick that All Pairs applies is covering more than one pair in each test case.
Generally, you can expect that All Pairs Testing will result in a number of test cases roughly equal to the number that would be required to obtain full coverage for the two parameters with the largest numbers of test values. In this case, we expect that we end up close to 3 x 4 = 12 test cases.
A set of tests that satisfies the all-pairs criterion is shown in Table 1.
Table 1 A Set of Tests Satisfying the All Pairs Criterion
Test Case | Client OS | Browser | Server OS | Web Server |
---|---|---|---|---|
1 | Windows | Chrome | Linux | Lighttpd |
2 | OSX | Firefox | Windows | Apache |
3 | Linux | Edge | Windows | Lighttpd |
4 | Windows | Safari | Windows | Apache |
5 | OSX | Edge | Linux | Apache |
6 | Linux | Firefox | Linux | Lighttpd |
7 | OSX | Chrome | Windows | Apache |
8 | OSX | Safari | Linux | Lighttpd |
9 | Linux | Safari | Linux | Apache |
10 | Windows | Firefox | Linux | Lighttpd |
11 | Windows | Edge | Windows | Lighttpd |
12 | Linux | Chrome | Windows | Apache |
You can validate by hand that this set indeed contains all pairs. For example, test cases 1, 4, 10 and 11 cover all browsers on Windows. Cases 1, 7 and 12 cover all client OSs with Chrome. At the same time, case 1 and 7 cover all web servers with Chrome as well as all server OSs with Chrome.
All Pairs Testing on its own is no guarantee for high defect finding capability, as shown in the paper "Pairwise Testing, A best Practice That Isn't" by James Bach & Patrick Schroeder. They state, amongst others: "Pairwise testing can only protect against pairwise interactions of the input values you happen to select for testing". In other words, if your input values don't cover all equivalence classes, you are very likely to miss defects. All Pairs Testing can be a very useful tool, but it must be applied consciously.
All Pairs Testing Tooling
Jenny
Creating the All Pairs set can be a tedious task. Fortunately, there are freely available tools that can help us, for example Jenny, which was used to help creating Table 1). The parameters are the number of values for the input parameters. To generate test cases for above example, we use:
C:\Apps\jenny>jenny 3 4 2 2
1a 2b 3b 4b
1b 2a 3a 4a
1c 2d 3a 4b
1a 2c 3a 4a
1b 2d 3b 4a
1c 2a 3b 4b
1b 2b 3a 4a
1b 2c 3b 4b
1c 2c 3b 4a
1a 2a 3b 4b
1a 2d 3a 4b
1c 2b 3a 4a
Jenny uses a very simple method to display the values: a number for the input parameter and a letter for the value of the parameter. The only thing you need to do yourself is to make the mapping: 1 = Client OS, value a = Windows, b = OSX, c = Linux, etc.
It is important that you implement all test cases that an All Pairs algorithm provides. But in practice not all pairs may be possible. For example, assume that we also want to test on the Internet Information Services (IIS) webserver. Then by default, Jenny would generate test cases that include Linux: IIS, which won't work because IIS is only available for Windows. But if you skip the Linux: IIS tests, you are likely to lose other pairs as well. To solve that issue, Jenny allows you to specify pairs that you don't want to see in the test set. This is how you specify that the Linux: IIS pair (3b: 4c) should not occur:
C:\Apps\jenny>jenny 3 4 2 3 -w3b4c
1a 2b 3b 4b
1b 2c 3a 4c
1c 2d 3b 4a
1b 2a 3b 4a
1a 2b 3a 4a
1c 2a 3a 4b
1a 2c 3b 4b
1a 2d 3a 4c
1b 2d 3b 4b
1c 2b 3a 4c
1a 2a 3a 4c
1c 2c 3a 4a
1b 2b 3b 4a
Notice that indeed IIS (4c) is only paired with Server OS Windows (3a). Also, all combinations of Client OS (1a, 1b, 1c) and IIS as well as Browser (2a, 2b, 2c, 2d) and IIS are included.
In the same fashion, you can expand the tests to include browser testing on Internet Explorer (2e), but not including tests on Linux (1b) and OSX (1c) for Internet Explorer:
C:\Apps\jenny>jenny 3 5 2 3 -w3b4c -w1b2e -w1c2e
1a 2e 3b 4b
1b 2c 3a 4c
1c 2d 3b 4a
1c 2a 3a 4b
1a 2b 3a 4a
1b 2a 3b 4a
1a 2d 3a 4c
1b 2b 3b 4b
1a 2c 3b 4b
1a 2e 3a 4c
1c 2b 3a 4c
1a 2a 3a 4c
1c 2c 3b 4a
1b 2d 3b 4b
1a 2e 3b 4a
Sometimes there is value in going beyond All Pairs and taking All Triplets. This will substantially increase the number of test cases, so you should only do that if there is a very good reason. Jenny supports All Triplets by using the -n parameter. Notice how the number of test cases triples:
C:\Apps\jenny>jenny 3 5 2 3 -w3b4c -w1b2e -w1c2e | find /c "1"
15
C:\Apps\jenny>jenny -n3 3 5 2 3 -w3b4c -w1b2e -w1c2e | find /c "1"
45
PICT
Another freely available tool is Microsoft's Pairwise Independent Combinatorial Testing tool, or PICT. With that, you need to do a bit more upfront work to create your specification as a model file. The benefit is that you get the parameter values immediately, without the need for substitution. The model file corresponding to the last Jenny example would be:
Client OS: Windows, OSX, Linux
Browser: Firefox, Chrome, Safari, Edge, Internet Explorer
Server OS: Windows, Linux
Web Server: Apache, Lighttpd, IIS
IF [Browser] = "Internet Explorer" THEN [Client OS] = "Windows";
IF [Web Server] = "IIS" THEN [Server OS] = "Windows";
Run it as follows:
Pict model.txt
The result will be a tab delimited text table:
Client OS Browser Server OS Web Server
OSX Edge Windows Lighttpd
Windows Safari Linux Lighttpd
Linux Chrome Windows Lighttpd
Windows Edge Windows IIS
Windows Internet Explorer Windows IIS
Linux Edge Linux Apache
OSX Safari Windows Apache
OSX Chrome Windows IIS
Linux Safari Windows IIS
OSX Firefox Linux Lighttpd
Windows Firefox Windows Apache
Windows Chrome Linux Apache
Windows Internet Explorer Linux Lighttpd
Linux Firefox Windows IIS
Windows Internet Explorer Linux Apache
Also, PICT allows you to create all-triplets:
C:\Apps\PICT>pict book_demo.txt | find /c /v ""
16
C:\Apps\PICT>pict book_demo.txt -o:3 | find /c /v ""
42
In this case, PICT seems to be a bit more efficient than Jenny since it requires only 41 cases for all triplets (we need to subtract the header from the line count), while Jenny needs 45
Combining All Pairs Testing with Input Domain Coverage
As can be seen in Domain Coverage, just satisfying input domain coverage does not necessarily mean that your set of test cases is optimally effective and efficient in finding defects. The only thing that input domain coverage achieves is that all equivalence classes of all input values are covered at least once. And we saw in the previous section that All Pairs Testing is useful but no guarantee for success.
Combining All Pairs testing with input domain coverage can significantly improve the defect finding effectiveness and efficiency. What it comes down to is that you create your test cases such that all valid equivalence classes of all input values are paired at least once in your set of test cases. Let's call this All Pairs Input Domain Coverage.
With two variables, Al Pairs means covering all combinations of valid equivalence classes, so the 5 valid equivalence classes for speed and the 3 valid classes for traffic light state that we saw in the Domain Coverage guidance would lead to 15 pairs that need to be covered. For the speed parameter, we also include boundary value testing, which we can easily do since there are 3 pairs for every speed class and at most 2 speed values per class. We simply alternate the values (using the pipe symbol: |).
Here is the model in PICT:
Speed Class: Reverse-Speeding, Reverse-Not Speeding, Standstill, Forward-Not Speeding, Forward-Speeding
Speed: -100 | -51, -50 | -1, 0, 1 | 50, 51 | 250
Traffic Light: Red, Yellow, Green
IF [Speed Class]="Reverse-Speeding" THEN [Speed] in { -100, -51 };
IF [Speed Class]="Reverse-Not Speeding" THEN [Speed] in { -50, -1};
IF [Speed Class]="Standstill" THEN [Speed] = 0;
IF [Speed Class]="Forward-Not Speeding" THEN [Speed] in {1, 50};
IF [Speed Class]="Forward-Speeding" THEN [Speed] in {51, 250};
Table 2 shows the result; the Expected Result column was added manually. Of course, the All Pairs algorithm only provides the input, and it is the tester's job to provide the right expected results.
Table 2 All Pairs Input Domain Coverage with Boundary Values
Speed Class | Speed | Traffic Light | Expected result |
---|---|---|---|
Forward-Speeding | 51 | Yellow Violation | |
Reverse-Not Speeding | -50 | Red | No violation (not passing light) |
Standstill | 0 | Yellow | No violation |
Reverse-Not Speeding | -1 | Yellow | No violation |
Forward-Speeding | 250 | Green | Violation |
Reverse-Speeding | -100 | Red | Violation |
Forward-Speeding | 51 | Red | Violation |
Forward-Not Speeding | 1 | Red | Violation |
Standstill | 0 | Red | No violation |
Standstill | 0 | Green | No violation |
Forward-Not Speeding | 50 | Yellow | No violation |
Reverse-Not Speeding | -50 | Green | No violation |
Reverse-Speeding | -51 | Yellow | Violation |
Reverse-Speeding | -100 | Green | Violation |
Forward-Not Speeding | 1 | Green | No violation |
Add to this set the negative cases from Table 6 in Domain Coverage (speed -101 km/h and 251 km/h) and traffic light state Blue, and we are all set.
While this already shows benefits in terms of the potential for defect identification with a limited set of test cases, the benefit would become very clear if you have more than two input variables, as applying All Pairs will result in substantially less required test cases compared to full coverage.