Code Complexity - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki

📖 Definition

Factors that make code difficult to understand. This includes length, syntactical elements, data flow patterns, variable names and more. [Ajami]
McCabe’s Cyclomatic Complexity: The complexity of a sequence of steps is equivalent to the cyclomatic complexity of the strongly connected equivalent of its control flow graph. [MCC]
Code Complexity Measuring Tool: Using Cognitive Code Complexity (CCC), the following metrics are evaluated to calculate the code complexity: size, type of control structures, nesting level of control structures, inheritance level of statements, try-catch blocks, recursive calls, array declarations, compound conditions, and input/output statements. [CCMT]
Institute of Electrical and Electronic Engineers: The extent to which the design or implementation of a system or component obstructs comprehension and verification. [IEEE]
Halstead Software Science: Effort required in implementing or understanding the program. It is directly proportional to difficulty and volume. [Onyango]

Many more definitions exist, but these are the most applicable.

📊 How to measure

1. Naive:

Stop the time it takes for multiple developers to accurately interpret a snippet of code. [Ajami]

2. McCabe’s Cyclomatic Complexity:

[MCC]

Generate a control flow graph $G$. Each node represents a block of sequential code with a single entry and exit. The edges represent the way the nodes are connected or follow after each other. Add an additional edge from the starting node to the ending node to connect the graph strongly. The number of nodes is represented as $n$. The number of edges is defined as $e$. The cyclomatic number $V(G)$ can be determined as follows:

$V(G) = e-n+1$

An alternative to this is measuring the predicates (or decisions) p in the code. Predicates can consist of more than one condition (e.g. if C1 and C2).

$V(G) = p + 1$.

The established norm for calculation the cyclomatic complexity in software is based on this concept, but slightly different in its execution. A node $n$ is an individual command or line of code that dose not contain a decision. An edge $e$ is the connection between nodes. The amount of connected components $p$ can be understood of the number of routes to the end of a control-flow-graph or as the amount of isolated sections in a graph.

Basic cyclomatic complexity: $V(G) = e-n+2p$

Counting decision points: $V(G) = p + 1$

Summing up predicate nodes: $V(G) = p + 1$

Specifics of decision points:

every if and else if is a decision point. The else is not a decision point.
a loop setup while or for is a decision point. They can be interpreted as if -> continue loop and else -> exit loop
nested if statements count as individual decision points [Marilyn]

example:

if(A and B): do something

else: do something different

would result in $V(G)=2$, since $p=1$

but

if(A):

if(B):do something

else: do nothing

else: do something different

would result in $V(G)=3$, since $p=2$

There exist some counting rules for common code constructs that can help determine $V(G)$ if the equations are too difficult to determine. [MCC]

3. The CCC Measure:

CCC_q = \sum_{k=1}^{m} [S_k \times(W_c+W_n+W_i)_k]+(W_{tc}+W_{rc}+W_{cc}+W_{ad}+W_{io})_k

$S_k:=$ Size of the k-th executable statement in terms of token count (operators + operands + methods/functions + strings) [Chhillar]

$W_c:=$ Type of Control Structures

$= 0$ for sequential statements,

$= 1$ for decision-making control statements like if.. then .. else,

$= 2$ for decision making control statements like while .. do, for loop, do .. while,

$= n$ for switch statement with n cases [Chhillar]

$W_c:=$ Inheritance Level of Statements

$= 0$ for statements inside the outermost level of inheritance,i.e inside the base class,

$= 1$ for statements inside the next level of inheritance, i.e first derived class,

$= 2$ for statements inside the next deeper level of inheritance, i.e next derived class and so on. [Chhillar]

$W_{tc}:=$ Try-Catch Blocks

$+=1$ for every occurrence of try

$+=1$ for every occurrence of catch [CCMT]

$W_{rc}:=$ Recursive Calls

$ for first recursive call

$+=2$ for each additional recursive call of the same method [CCMT]

$W_{ad}:= W_a \times (\sum S_{ad})$ Array Declarations

$W_a:=$ Weight of array

$S_{ad}:=$ Size of array dimension [CCMT]

$W_{cc}:=$ Compound Conditions

for and and or the equal weight as the control structure they appear in.

$W_{io}:=$ Input/Output Statements

$+=1$ for each input/output statement

5. Halstead Software Science:

E = D\cdot V = [\frac{\eta 1}{2}\cdot \frac{N2}{\eta 2}] \cdot [N \;\log_n(\eta)]

$\eta1 =$ number of (distinct) unique operators ({}, =, ==, ;, while(), /, :, if(), else, return, +, ...)

$\eta2 =$ number of (distinct) unique operands (main(), int, print(), function_name, variable_name, 1, 0, `',...)

$N1 =$ total number of operators

$N2 =$ total number of operands

$\eta:=\eta1 + \eta2$ Program Vocabulary

sum of (distinct) unique operands and (distinct) unique operators

$N:=N1 + N2$ length of the Program

$V:=N \log_n(\eta) = (N1+N2) \log_n(\eta 1+\eta 2)$ Program Volume

$D:=\frac{\eta 1}{2}\cdot \frac{N2}{\eta 2}$ Difficulty

number of unique operators and total usage of operands

$E:= D\cdot V$ Effort [Onyango]

Some automated measuring Tools exist, like the Complexity Measurement Tool (CMT) and Code Metrics (CM), but these are not applicable for all programming languages.[Ogunrinde]

References

Ajami MCC CCMT(Page 101 to 112) IEEE Chhillar Onyango Ogunrinde(Page 55-64) Marilyn