How Susereum Achieves Sustainable Code - obahy/Susereum GitHub Wiki

Code Analysis

Background

This module analyzes a project's source code and generates a results.csv file that contains metrics related to 9 code smells that include Large Class, Small Class, Large Method, Small Method, Large Parameter List, God Class, Inappropriate Intimacy, Comment-to-code ratio (upper and lower bounds).

This module uses a 3rd party tool to perform analysis on the code. The wrapper is basically an interface to filter out the results generated from SourceMeter 8.2.0. SourceMeter performs many different types of analysis on the supplied code base project directory.

As of the latest release, the Code Analysis module only analyzes Java projects and Python projects.

Extending the Code Smells

To extend what code smells that Susereum recognizes, one must be able to find metrics that correspond to, and are indicative of, a particular smell. The current implementation of the Code Analysis module recognizes 9 smells:

Small Class
Large Class
Small Method
Large Method
Large Parameter List
God Class
Inappropriate Intimacy
Small Comment-to-Code Ratio
Large Comment-to-Code Ratio

For the Code Analysis module to recognize these smells, there must be metrics that are generated by Source Meter that correspond to the smells. These are following metrics found that correspond to the 9 smells:

LOC - Lines of Code (Class/Method Levels)
- Since LOC is a metric that is generated at both Class and Method levels, we can use this Small/Large Class and Small/Large Method.
NUMPAR - Number of Parameters (Method Level)
- This can be used for Large Parameter List
CD - Code Density (Class/Method Levels)
- This can be used for Small/Large Comment-to-Code Ratio
NOI - Number of Outgoing Invocations (Class Level)
- We can use this to infer possible God Classes
CBO - Number of Directly Used Other Classes (Class Level)
- We can use this to infer possible Inappropriate Intimacy

Inside of SourceMeter_Interface/src/constants.py, there are the following defined constants:

CLASS_KEEP_COL
- This is a Python list containing column names that will be kept for Source Meter-generated metrics at the Class level.
METHOD_KEEP_COL
- This is a Python list containing column that will be kept for Source Meter-generated metrics at the Method level.

A complete list of Source Meter metrics can be found at: https://www.sourcemeter.com/resources/

If one were to want to extend the smells detected by the Code Analysis module, one would need to find metrics that correspond to a smell. As long as Source Meter outputs this metrics, only a quick modification to CLASS_KEEP_COL or METHOD_KEEP_COL would need to be made. Specifically, adding in the metrics name to the list. It is important to note, however, that any modifications to the metrics must also be incorporated in the health function, in the Sawtooth Health Family, as well as in the Sustainability Measures file.

Extending Language Support

Source Meter supports the following languages:

Java
C
C++
C#
Python
RPG

The current implementation of the Code Analysis module only recognizes Java and Python projects. If one were to want to extend the Code Analysis Module to support other languages, one would need to make the necessary modifications inside of sourceMeterWrapper.py, which is located in /<Path to Susereum>/CodeAnalysis/SourceMeter_Interface/.

Inside of this file, there is the following functions:

def get_project_type(directory)
- Currently, this function will look in a given directory and count the number of Java files (.java) and the number of Python files (.py). If the directory contains more Java files than Python files, the directory is considered to be a Java project. If the directory contains more Python files than Java files, the directory is considered to be a Python project. This would need to be changed to account for other programming languages and their related file extensions.
def exec_metric_analysis(project_dir, project_name, project_type, results_dir)
- This function is responsible for constructing the command to execute Source Meter analysis. Currently, this function is only capable of constructing commands related to the Java and Python executables that are included in Source Meter. This would need to be extended in order to account for whatever language that will be added. In the case of this, it may be wise to also extend the constants.py file to also create constants that specify the path to executables for these other languages.
def consolidate_metrics(project_name, project_type, results_dir):
- This function in responsible for consolidating metrics that are generated from Source Meter analysis into a single CSV file that contains only those metrics that are related to smells that Susereum recognizes. A change to this function would be needed because depending on what type of project is being analyzed, Source Meter will generate its files in a directory that corresponds to the executable used and the project type. For example, a Java project will require Source Meter's Java executable. When this executable is ran, Source Meter will generate its files in: <Path to results directory specified>/<project name>/java/. Extending the language support for the wrapper would require one to account for this Source Meter behavior.

Sawtooth Blockchain Module

Sawtooth consist of three sub-modules that handle diverse tasks from validating blocks to processing and verifying the proper format and use of code smell, proposals, and commits.

REST-API

Hyperledger Sawtooth provides a pragmatic RESTish API for clients to interact with a validator using common HTTP/JSON standards. It is an entirely separate process, which once running, allows transactions to be submitted and blocks to be read with a common language-neutral interface.

The REST API treats the validator mostly as a black box, submitting transactions and fetching the results.

Validator

The Validator module acts ass a middleman between the REST API and the Family Transaction. This Module processes most of the blockchain related operations like consensus protocol, validating and generating blocks. The validator module also interacts as a network layer for Sawtooth. This layer is responsible for peers communication, network, and message delivery.

Sawtooth defines three states related to the connection between any two validator nodes:

Unconnected
Connected - A connection is a required prerequisite for peering.
Peered - A bidirectional relationship that forms the base case for application-level message passing.

Family Transactions

The data model and transaction language are implemented in a transaction family. The Family transaction processes all transactions, and it is responsible for validating the data its format.

The Family Transaction Module consists of three core families each family handles different business logic:

Code Smells
Health Measures
Suse

For more details regarding Sawtooth and its modules will later be discussed in this guide. Additional information can be found in the official documentation of Hyperledger Sawtooth

Health Measure Module

Health Measure Functions to Assess Sustainability of Software

Purpose

Incentivize and quantify the health and sustainability of code contributions, and ultimately promote sustainable software development practices based on the following requirements:

Occurrence of a code modification. When the codebase is modified, the network applies health measure functions to estimate the improvement or declination of the overall health of the codebase because of this modification.
The specific measurement functions are applied to assess code health, and a resulting positive value indicates health improvement and a negative value indicates declinations in overall health.

Function

Covers all the conditions, validations, and calculations to estimate improvement or declination of the overall health of the project based on the following requirements:

Receive the Code Analysis (Sourcemeter) file which evaluates the complete codebase of the project and contains the Code Metrics (cm).
Upload the Code Smells file which is previously setup for the specific project.
Read each line in the Code Analysis file and evaluates each Code Metric (cm) against each corresponding Code Smell with the Health (h) functions.
Reward each Code Metric (cm) with 100 units when it complies with the Code Smell.
Penalize each Code Metric by subtracting units when it does not comply with the Code Smell.
Build a quadratic function to penalize Code Metrics (cm) that are over the large Code Smell (lcs) set. Apply penalization up to two times the large Code Smell (2*lcs) and anything over that sets the health to zero (h = 0 when cm > 2*lcs)
Build a quadratic function to penalize Code Metrics (cm) that are under the small Code Smell (scs). Code Metrics (cm) are always greater or equal to zero
Get average of calculated health of all Code Metrics and repeat for the rest of Code Smells than apply.
Get average of each calculated health of all Code Smells to get total health which represents the total health measure of the code base. Compare the health measure against the previous health measure of the project to assess health improvement or health declination.

Health Measure Functions

Use formulas to calculate and validate conditions of the Code Metric (cm) value, which are, less than the small Code Smell Metric (scm) for penalization*,* greater than the large Code Smell Metric (lcs) for penalization*, or* none of them for reward. This is to calculate the health (h) of each line of evaluated code included by the code analysis.

The reward constant (rw) stores 100. This value is the total of units rewarded for complying with the corresponding Code Smell parameters. It is also used to calculate the health function.

The health (h) variable is initialized with zero. It stores the calculated value of health code once the formula is applied.

Condition for penalization when code metric is under small Code Smell (cm < scm)

Condition for penalization when code metric is over large Code Smell (cm > lcs)

Condition for reward when none of the conditions for penalization are met then cm >= scm and cm <= lcs

Case

a) Code submission with a set of ten classes and Code Metric for Code Smell line of code (LOC).

b) small Code Smell (scs) is set to 100 and large Code Smell (lcs) is set to 200.

c) Where cm (10, 20, 50) are under small Code Smell (scs) it applies penalization formula.

d) Where cm (300, 360, 400, 500) are over large Code Smell (lcs) it applies penalization formula.

e) Where cm (100, 150, 200) are between scs and lcs it applies 100 units reward.

f) Get average of the ten health values to calculate the total health for that Code Smell.

Health Case

Health Graph