3 Domain Coverage - essenius/AcceptanceTesting GitHub Wiki

Test Coverage

The concept of test coverage is aimed to establish whether the test suite that has been defined to test a certain specification covers that specification to a sufficient extent. The other part of the equation is to do this efficiently. Stated differently:

The key to efficient and effective testing is to achieve a desired level of coverage with the fewest possible test cases

-Steve Tockey, Construx instructor

Testing can never be complete. The question of how much testing is enough is a business decision, not a technical one. The amount of testing that is justified will be a function of the amount of risk exposure that defects can cause. If people can die because of mistakes in the software, then testing must be very rigorous and full coverage may be mandated. If it's only a minor annoyance, then cursory testing may be sufficient.

Requirements Testing

With requirements testing, you ensure that all requirements have at least one (but often more) test case to test whether the requirement has been met. In order to do this, requirements need to be unambiguous (have only one interpretation), testable, and binding (someone is willing to pay for it).

In practice, we see that requirements are often not that well defined, with statements as "the system shall be fast", or "the system shall detect a 0.25-inch defect in a pipe section". In those situations, further clarification is necessary:

  • "What do you mean when you say 'fast'? Do we need a response in a day, an hour, a minute, a second?"
  • "What response time is enough to achieve your business benefits? Would you be willing to pay $10k more to reduce the response time to 5 seconds, or can you live with a 10 second response time?"
  • "What do you mean when you say 'defect in a pipe section'? Are there distinct types of defects? How can you detect them?"
  • "What do you mean when you say 'detect'? Should it only deliver yes or no, or should it also provide the location of the defect? How is that location specified? Should it also indicate the type of defect?"
  • "What do you mean when you say '0.25 inch'? Do you mean 'at least', 'exactly', 'as small as'? Is detection of a smaller defect than 0.25 inch a failure or not?"
  • "How can you measure the size of a defect?"
  • "What do you mean when you say 'a' defect in a pipe section? Can we have more than one defect? If so, should the system find all defects, or just stop after finding the first one?"

This is a requirements elicitation technique and not a testing technique. However, the testing world and the requirements management worlds are getting closer to each other, especially where acceptance test driven development (ATDD) or behavior driven development (BDD) are being used.

Input Domain Coverage

When designing the Arrange part of a test case, you need to think about which input values are useful to test. Notice that input values are not limited to just parameters; they can also come from system state (e.g. a bank account being open or closed). Since it is generally not possible to exhaustively test all possibilities, a proper selection of input values is needed. A useful technique to reduce the number of test cases needed by avoiding redundancy is input domain coverage.

Every input value for a function has a (semantic) domain, i.e. the set of values that it can logically take on. For example, the input to a square root function must be a non-negative real number (if we don't want to end up in the imaginary domain). In software, these domains are often mapped to data types which may have a larger domain, i.e. they may be able to take on more values than the semantic domain would allow.

For example, assume that we map a temperature in Fahrenheit onto a Float data type. The absolute minimum temperature is -459.67°F. It is physically impossible to have a temperature lower than that. While -500 is a perfectly valid number, it doesn't make sense if it is used to represent a temperature in the Fahrenheit scale.

Input domain coverage means having enough test cases to cover the data type domain of each input variable.

As this example already implies, there are ranges in the input domain that are equivalent from a testing perspective. It wouldn't matter if you used -500°F or -1000°F; they are both impossible. Such ranges are called equivalence classes.

Equivalence Classes

Equivalence classes are ranges in an input domain that are equally likely to reveal a certain defect in the function being tested, but the defect would be different from defects revealed by a value in a different equivalence class. At minimum, there are two equivalence classes for valid and invalid values, but usually there are more. For example, if the function being tested calculates the state of water (solid, liquid or gas) at normal pressure, then any value below 32°F should return 'solid', any value above 212°F should return 'gas'. Any value between 32°F and 212°F should return 'liquid'. However, since we are talking about temperatures, we may have unit conversion errors (e.g. an assumption that the input value is in centigrade) so it would also be useful to factor in the values 0 and 100: if we get a return value 'liquid' for an input value of between 0 and 32, we are very likely to have a unit conversion error. Furthermore, we saw in the previous section that the temperature physically cannot get any lower than -459.67°F. Table 1 shows the set of equivalence classes.

Table 1. Equivalence classes for water temperature

Class A Class B Class C Class D Class E Class F
< 459.67°F < 0°F < 32°F < 100°F <212 °F > 212°F
Invalid, too cold solid °F and °C solid °F/liquid °C liquid °F/°C liquid °F, gas °C gas °F/°C

Two particular cases are exactly 32°F (freezing point) and exactly 212°F (boiling point). Since they emit specific behavior, they should be considered separate equivalence classes. We will call those G and H.

If the input domain consists of enumerated values (e.g. non-numerical ones, such as red, yellow and green for traffic lights), then all those values are equivalence classes. The invalid equivalence class is then "everything else".

We get input domain coverage if our set of test cases contains at least one value of each equivalence class. Table 2 below table shows a test set achieving that for the State of Water example.

Table 2. The states of water

Input value: Temperature (°F) Expected result Equivalence Class
-500 Invalid A
-250 Solid B
10 Solid C
32 Freezing Point G
75 Liquid D
150 Liquid E
212 Boiling Point H
250 Gas F

As we saw, in some cases, single numerical values can represent an equivalence class. Quite often this is the case with 0, since using 0 will expose different defects than any other value (consider e.g. division by 0).

Deriving test cases via equivalence classes

Algorithm

You can derive positive and negative test cases from knowledge of the equivalence classes. Here is the algorithm (notice that it is not entirely mechanical; there is judgment involved as well):

For each function to be tested
    For each input variable in the function
        Find the set of equivalence classes that spans the variable's domain (valid and invalid)
        Repeat
            Define new test case with one equivalence class per input variable, that convers as many yet-uncovered 
            equivalence classes as possible, distributing values as widely as possible
        Until all valid equivalence classes are used in at least one test case
        Repeat
            Define new test case with one unused invalid equivalence class and the reset valid equivalence classes
        Until all invalid equivalence classes are used in at least one test case

Example

Say we are going to build a system for registering traffic violations at a traffic light in a city area in The Netherlands. The two violations we are interested in are speeding and driving through the red light. The function to be tested is whether a violation is made, based on corrected car speed (i.e. considering a margin for inaccuracy of speedometer and sensor) and the state of the traffic light. The speed limit is 50 km/h and traffic lights have 3 states: red, yellow and green.

First, we define the equivalence classes for speed. Two obvious valid equivalence classes are speeds below and above 50 km/h. But also, the standstill is important (0 km/h): no violation occurs if a car is waiting before the red light. In theory, also negative speeds are possible (while unlikely): that would imply that the vehicle is driving in reverse gear: going backwards. There are limits to how fast a road worthy car can drive and how fast it can accelerate. We will assume that there is no way that any car can go faster than 250 km/h in a city area. Also for reverse speeds there are limits. Cars with some exotic types of automatic gears can go in reverse as fast as forward, but we'll assume that going in reverse faster than 100 km/h is impossible. This leads to the equivalence classes shown in Tables 3 and 4.

Table 3. Equivalence classes for speed

Invalid reverse speed Reverse - Speeding Reverse - not speeding Standstill Forward - not speeding Forward - speeding Invalid forward speed
< -100 km/h >= -100 km/h >= -50 km/h 0 km/h <= 50km/h <= 250 km/h > 250 km/h

Table 4. equivalence classes for traffic lights

Green Yellow Red Anything else (invalid)

Let's create a set of test cases that achieves input domain coverage based on these equivalence classes.

We start by creating a test case for the Reverse - speeding class. We simply pick any value in the equivalence class: -65. We expect the result to be a violation since the car is speeding (even if in reverse). For traffic light state, we take select Green to make sure that the decision is based on the speeding part only. Next, we create a test case for Reverse - not speeding (speed -25), and we select a different value for traffic light state to increase coverage on that input value as well. We select Red, since that allows us to check whether the traffic violation algorithm correctly sees that moving away from the red light in reverse gear is not a violation as you don't pass the traffic light. Then we take the Standstill class (speed 0), and the most obvious choice there is Red as well, since this is when people wait at the red light, which is not a violation. You might say that we should use Yellow instead since that has not been used yet, but the combination of standstill and red seems to have more potential to uncover defects, and we have more test cases coming up anyway. Then we take the Forward - not speeding class (speed 35), and we select the only valid traffic light state we didn't use yet: Yellow. Finally, we take Forward - speeding (speed 90) which is a violation with any traffic light state. Which state we choose doesn't matter for coverage, but we select Yellow since we used Red twice already, and we used Green for reverse speeding. Perhaps using Yellow instead will reveal a specific defect. Table 5 shows the set of positive test cases that we end up with.

Table 5 Valid Test Cases for the Traffic Violation Example

Speed Equivalence Class Speed (km/h) Traffic light state Expected result
Reverse - speeding -65 Green Violation
Reverse - not speeding -25 Red No violation
Standstill 0 Red No violation
Forward - not speeding 35 Yellow No violation
Forward - speeding 90 Yellow Violation

Table 6 shows a set of negative test cases that can be derived via the algorithm. We start by taking each of the invalid speed classes and pairing these with any traffic light state; the result should be an invalid speed. Then we take the invalid traffic light state and pair that with a valid speed class; the result should be invalid traffic light state.

Table 6 Invalid Test Cases for the Traffic Violation Example

Speed Equivalence Class Speed (km/h) Traffic light state Expected result
Invalid reverse speed -110 Green (can be any valid) Invalid speed
Invalid forward speed 500 Red (can be any valid) Invalid speed
Forward - speeding (can be any valid) 70 Blue Invalid traffic light state

The combination of the two tables gives input domain coverage. Notice that this doesn't mean you have done exhaustive testing. It just means that you covered each equivalence class in each domain in at least one test case. For example, the test cases above do not contain a test for driving through the red light at a low speed, which is also a violation.

Output Domain Coverage

Like with the input domains, it is also possible to create equivalence classes for the output(s) of a function, and design your tests in such a way that all classes have at least one test case associated with them. This will result in output domain coverage. Notice that invalid equivalence classes also exist for the output domain. So, for the state of water example from Section 7.4.2, we have equivalence classes Solid, Freezing Point, Liquid, Boiling Point, Gas and Invalid. Table 7 shows a test set that achieves output domain coverage for that example:

Table 7 Output Domain Coverage Example

Temperature (°F) Equivalence Class for Output Domain
-500 Invalid
-200 Solid
32 Freezing Point
100 Liquid
212 Boiling Point
300 Gas

This is of course a very simple example. The right column has a dual role here; it's both the expected result and an equivalence class. The algorithm to determine the output is often much more complex.

Generally, achieving output domain coverage is more difficult than achieving input domain coverage because forcing output values requires knowledge of the function to be tested (i.e. expertise in the business area). And if you achieve input domain coverage, typically you are very close to output domain coverage as well. It does make sense to achieve output domain coverage for all error responses.

Boundary Value Testing

Boundary value defects are very common. A good example of a boundary value defect is the off-by-one error. Since they are so common, it is good practice to test for them. This means for numerical equivalence classes:

  • Create (positive) test cases for the lowest and the highest value in each equivalence class.
  • Create (negative) test cases for the predecessor of the lowest value and the successor of the highest value in each class (as far as they weren't incorporated yet)
  • Create test cases for special cases and their closest neighbors. Very often, 0 is a special case. So then, e.g. in the integer domain, include 0, -1 and 1. In the speed example (assuming whole number precision), this would translate to making test cases with the following speeds:

Positive cases:

  • Extremes of class Reverse - Speeding: -100 and -51
  • Extremes of class Reverse - Not Speeding: -50 and -1
  • Extremes of class Standstill: 0
  • Extremes of Class Not Speeding: 1 and 50
  • Extremes of Class Speeding: 51 and 250

Unincorporated predecessors/successors of extremes:

  • Class Reverse - Speeding: -101
  • Class Speeding: 251

Special case 0: n/a (-1, 0 and 1 already covered)

Table 8 shows a possible set of test cases based on this analysis. Notice that now we have more than one value per equivalence class for speed, we can also increase our defect finding capability by using more traffic light state values per speed equivalence class.

Since this method is designed to cover the domains, there is no need to incorporate other values to obtain input domain coverage.

Table 8 Boundary Value Testing Example

Speed Traffic light state Expected Result
-101 Green Invalid speed
-100 Red Violation
-51 Amber Violation
-50 Red No violation (moving away from traffic light)
-1 Green No violation
0 Red No violation
1 Red Violation
50 Amber No violation
51 Amber Violation
250 Green Violation
251 Green Invalid speed

You might think that using domain coverage techniques might lead to combinatorial explosion when you have multiple variables to take into account. Check the All Pairs Testing page on how to deal with that.