DataSet Explorer (DSE) tool - Clean-CaDET/dataset-explorer GitHub Wiki
In this section, we will present the DataSet Explorer functionalities. We will describe the most important functionalities and present activity diagrams to provide a better understanding of these functionalities. We will also provide videos demonstrating the functionalities. The playlist with all the videos can be found here.
Secondly, we present class diagrams showing entities and their relationships. By understanding these entities and their relationships, users and developers can better customize DataSet Explorer to fit their needs and integrate it with other software or systems.
Functionalities are divided into the following sections:
- Annotation schema
- New dataset
- Annotations
DSE tool allows users to define annotation schema by creating code smells, heuristics and severities for each code smell. Users can create any number of code smells, heuristics, and severities. They can also read, update, and delete annotation schema entities (code smells, heuristics, and severities). An activity diagram of CREATE functionality can be seen below:
Annotation schema entities and their relationships can be examined in the class diagram below:

WATCH A VIDEO DEMONSTRATING THESE FUNCTIONALITIES.
The DSE tool allows users to search code smells and heuristics based on names. Users can also search severities based on values. Next, users can filter code smells by code snippet type.
WATCH A VIDEO DEMONSTRATING THESE FUNCTIONALITIES.
The DSE tool allows users to create, read, update, and delete datasets. After creating an empty dataset, the tool allows users to add projects to the dataset*. Users can also read, update, and delete projects within the dataset. Below is an activity diagram of CREATE functionality:
WATCH A VIDEO DEMONSTRATING THESE FUNCTIONALITIES.
DSE tool allows users to search datasets, projects, and instances. It also allows users to filter instances based on group. For example, the user created a dataset for two code smells: Long Method and Large Class. The tool created two groups (a group for Long Method and a group for Large Class code smell) and assigned instances to them. Filtering by group will display only instances that belong to that group.
WATCH A VIDEO DEMONSTRATING THESE FUNCTIONALITIES.
DSE tool allows users to annotate datasets. The user chooses the project within the dataset, the group within the project, and the instance within the group. The user analyzes the instance and fills in the annotation form. Users can also read and update previously created annotations. An activity diagram of CREATE functionality can be seen below:
WATCH A VIDEO DEMONSTRATING THESE FUNCTIONALITIES.
DSE tool allows users to automate the annotation process by switching to "Automatic annotation mode". When this mode is enabled, the DSE tool automatically switches the user to the next instance after the user annotates an instance.
WATCH A VIDEO DEMONSTRATING THIS FUNCTIONALITY.
DSE tool allows users to filter annotated instances. The user can filter instances based on several filters:
- whether instances are annotated or not
- what severity was assigned
- whether the user left the note or not
WATCH A VIDEO DEMONSTRATING THIS FUNCTIONALITY.
DSE tool allows users to determine instances requiring further annotation. These include:
- instances without any annotations
- instances annotated by a single user
- instances annotated by multiple users, but their annotations are in conflict, and there is no consensus among them
DSE tool allows users to determine fully annotated instances (all annotators annotated the instance) with disagreeing annotations. For such instances, users can see the completed annotation forms of other users. After the users have discussed the instance, the annotations should be changed to make a consensus.
WATCH A VIDEO DEMONSTRATING THESE FUNCTIONALITIES.
The dataset can be exported in several ways:
-
The user can export their annotations (draft dataset) at any time, regardless of whether he has annotated all instances.
-
The user can export a completely annotated dataset, which means that at least two annotators annotated each instance. The exportation is based on draft datasets previously exported by each user, so it is necessary to specify the path to the text file containing a list of those draft datasets. The exported complete dataset contains the following:
- Annotation files containing annotations of all users for each instance, and the final annotation obtained using the majority vote algorithm
- Heuristics files containing heuristics marked as applicable by each user for each instance
- Metrics files containing values of structural metrics for each instance
WATCH A VIDEO DEMONSTRATING THESE FUNCTIONALITIES.