Project 1 Report - gregnr/SoftwareEvolutionAnalysis GitHub Wiki

#Question

Does the volume of unit tests in a project relate to the frequency of bugs?

#####Hypothesis##### The amount of bugs a project has will decrease as the volume of unit tests increase.

#####Importance of this in Software Evolution##### It could be of great interest to developers as understanding a relationship between test cases and issues would help to allocate the proper amount of resources to QA.

#Methodology Tools:

  • Git
  • Github (issue tracking API)
  • Node.js
  • Plotly

Steps:

  1. Identify code bases that have a significant history of unit tests and provide issue tracking. We will be using Github for this.
  2. Develop a script that will, given a github repository and a path to the unit test folder, determine two data sets: Frequency of bugs and volume of unit tests over time. We will gather the frequency of bugs by recording the number of issues opened per unit time. We will gather the volume of unit tests by measuring the number of lines of code in all files within the test directory per unit time. The unit of time will be weekly.
  3. Graph the data and observe to see if there's any relation.

Instructions to run/compile our code:

Install Node.js and its package manager NPM. See these instructions.

Clone the repository:

git clone https://github.com/gregnr/SoftwareEvolutionAnalysis

Navigate into the repository's directory. Run the command:

npm install

This will download dependencies used by the program into a node_modules directory.

Run the program by entering:

node src/main.js

Follow the prompts on screen.

Data sources used in our experiment:

#Results Raw Results:

Below is a sample of the kinds of data that our tool can generate. For results on our chosen data sources, see this spreadsheet. Currently, the "Change in Test Volume" is calculated using Matlab, from the "Test Volume" metric. Generating this automatically is a future feature we would like to include in our tool.

Week Issues Test Volume Change in Test Volume
133 1 0 0
134 4 0 0
135 9 0 0
136 5 0 0
137 6 0 3583
138 5 3583 0
139 6 3583 0
140 5 3583 50
141 4 3633 550
142 5 4183 -1
143 29 4182 0
144 12 4182 0
145 7 4182 0
146 10 4182 0
147 7 4182 102
148 30 4284 0
149 3 4284 0
150 5 4284 0
151 7 4284 0
152 3 4284 0
153 6 4284 0

Sample Graph:
MelonJS:

Pixi:
Pixi

Angular.js:
Angular

#Analysis We created a spreadsheet and graphs to analyse and observe the relation between frequency of bugs and volume of unit tests over time. Each repository's results are in their own sheet (tab).

We found that there does seem to be a relation between unit tests and bugs produced, but in a different way than we originally thought. Observing unit test changes over time (derivative) seems to more accurately predict the quantity of bugs in the future. For example, around week 2335 in the angular.js results, we see a significant spike in unit tests followed by a significant decrease in issues in the remaining weeks. This relation is not as obvious if you just observe the raw unit test volume.

Threats to validity:

  • How well do lines of code represent unit test volume? There may be better ways to represent this volume - for example, would a ratio of unit test LOC to total project LOC be more accurate? We would like to answer these types of questions in future work.
  • GitHub bugs being added arbitrarily. Some projects, especially new ones, tend to have sporatic commits, bug reports, etc. Without a consistent development cycle involving QA and build automation, it may be hard to make a relationship between unit tests and bug reports. Because of this, some data sources simply will not show any relation. More mature projects however, do seem to produce a relationship. Angular.js for example, clearly shows a decrease in issues after adding unit tests in week 2335, but this only became clear after many years of development.

Future work:

  • Gather more data by running our tool on more repositories
  • Automate calculating unit test changes over time (derivative)
  • Complete project issues which make to tool more flexible for a greater range of projects. For example, let the user filter test file types with a regex expression.
  • Bug Fixes
  • Add automatic Visualization/graphs

#Project Management Information

Milestones and timelines:

Identify code bases that have a significant history of unit tests provide issue tracking.

January 30,2015

Develop a script to determine two data sets: Frequency of bugs and volume of unit tests over time.

February 17,2015

Graph the data and observe to see if there's any relation.

February 19,2015

Write the results and its analysis

February 20,2015

Finalize Report

February 23,2015

Roles of team members:

Greg Richardson and Jordan Heemskerk: Chief developers
Parker Atkins: Development/Analysis
Rabjot Aujla: Project Management,Analyst