Code Coverage - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki

General definition

Test adequacy criteria: quality of test, with higher quality test suits detecting more faults. Code coverage is a subcategory.

Control Flow Code coverage

This focuses on the individual decision nodes, the atomic sections, not the entire block. CodeCover a Java library in this study

Statement Coverage (block coverage)

Each statement executed at leas once. Here a statement constitutes any code-snippet, where information is written or evaluated
a=5 is 1 statement; if x > 0: is 1 statement, print(x) is 1 statement
The most basic form of code coverage

Branch Coverage

Examines all branches
Here both true and false cases of a decision point are examined
Better measure than Statement Coverage

MC/DC Coverage (Modified Condition-Decision Coverage)

Tests, that each atomic condition can independently affect the decision outcome

Loop Coverage

Test weather loops are being covered more than once.

[Memmati]

Implementation Ideas:

I think it should be very simple to implement branch coverage with most testing libraries. This should be testing the coverage of the response code on the prompt code. This is very basic but very necessary. Loop coverage is also standard to have in addition to branch coverage. If possible and easy, we can probably also add MC/DC, but this would be "going the exstra mile" and not 100% necessary.

Data Flow code coverage

Evaluate variable occurrences: definitions where a value is assigned as def $d$ and use where a value $v$ is referred / read as use $u$.

def-use pair coverage

The DUA $(d, u, v)$ pair is marked as satisfactory. The last $d$ recorded is seen as the definition that reaches $u$.
Use DUA-FORENSICS [Santelices]
basically these DUA pairs can be used to check what possible values can be defined at $d$ and if all values $v$ are handled/ valid in $u$. The coverage should

static analysis - before execution of test

dynamic analysis - during execution

reporting - determine DUA with statistic & dynamic analysis

[Memmati][Santelices]

Static Data Flow Testing

A flow diagram is created, tracking the static definitions and usages of variables.
The anomalies discovered here include defined, but unused variables, used but undefined variables and variables that are defined twice before use. [Geeksforgeeks] [Lambdatest]

Dynamic Data Flow Testing

During (test-)code execution create the flow diagram and analiese for the same anomalies as in Static Data Flow Testing [Lambdatest]

Implementation Ideas:

The most beneficial would be to test Static Data Flow on the response code, since we want to measure how good the responses are.

Create the flow diagram for the response-code
Check if any of the anomalies appear, note how many variables are created and how many are used, note how

Personal Opinion: Since the unit-test of our test_cases have little to no variables, this type of coverage test is a sanity-check on the response. It is suited for test that are more complex (for example integration-test, ui-test, ...), where many variables have to be created in the test. In my experience, the models we use do do not write code with these anomalies.

Path-Based Coverage

test entire sequences. This usually has exponential growth.

Intra-method paths (IMP)

Test each Path of a method. Start at beginning of method and end at return. The nested methods are not included.
loop-tests, and recursions would add an infinite amount of test. => this is a theoretical idear of path-based coverage

Acyclic intra-method paths (AIMP)

bound path by considering only acrylic paths of IMP
treat loops as single decision points. Do not consider recursion as different paths. [Gligoric]

Implementation Ideas:

AIMP is the best we can measure in this cathegory. It might be possible to measure this with an existing python library. Here we track how many of the possible paths have been traversed. We might be able to mearure this with a library.

pytest_cov.plugin PytestCovPlugin

Coverage.py

Pytest-Cov

Radon

If we can not use a library this is the approach I would take:

Create the control-flow graph of the prompt-code. This may use the same mechanisms as our mcc or ccc implementation. Importantly: a decision node is any form of if-else. If there is an elseif, then count if-elseif and elseif-else as two individual dissension nodes. Loops count as one dissension node, entering or not entering the loop-body. Recursions are not counted as dissension points and may be disregarded.
Split the graph into individual paths
During the execution of the test-code (extracted response) "count" how many of the paths are traversed with the test and mark them
Display the findings.

Alternatives / extensions / problems

State Coverage

measure how well the code executes specifications [Vanoverberghe]

Test Coverage

qualitative metric, that shows how well the requirements are being tested for. The focus is on how well the software dose what it is supposed to do, not on how many lines of the code are tested. This is a holistic approach on the product. It is usually part of integration-testing (testing weather things work together).[Gireesh]

Research

Even for 100% code coverage 7% to 35% of faults may be undetected[Hemmati]
other or complementary techniques should also be considered.
code coverage is not as prevalent in industry, since it adds computational cost[Ivankovio]

Sources

Vanoverberghe Hemmati Santelices Santelices2 Gligoric Lambdatest Geeksforgeeks Ivankovio Gireesh