Git statistics - AD-EYE/AD-EYE_Core GitHub Wiki

Using Hercules

Hercules (https://github.com/src-d/hercules#installation) is a tool that allows to extract statistics about a git repository and to plot them using the labours tool.

Analysing and plotting directly
./hercules --granularity=1 --burndown --languages="python" /home/adeye/adeye_temp/AD-EYE_Core |  labours -m burndown-project

The language can be replace by c++ or by matlab.

Analysing and saving the results
./hercules --granularity=1 --burndown --languages="python" --pb /home/adeye/adeye_temp/AD-EYE_Core >  analysis_results.pb
Combining saved results and saving the combination as pb
./hercules combine results1.pb results2.pb results3.pb > results123.pb
Combining saved results and plotting the combination
./hercules combine results1.pb results2.pb results3.pb | labours -m burndown-project
Plotting results save in pb format
./hercules combine results.pb | labours -m burndown-project
Skipping certain folders/files
./hercules --burndown --first-parent --pb --skip-blacklist --blacklisted-prefixes="prefix to skip" /repo_folder | labours -f pb -m burndown-project
Plotting with better time resolution (must have analysis results comman piped)
labours -m burndown-project --resample=month
labours -m burndown-project --resample=raw #shows commit granularity

Cleaning the repositories from noise

Some commit added a lot of lines of code that were not written but duplicated or added external projects. To remove this noise in the history the history must be rewritten to a clean state.

Note: if the files to be removed still exists in the HEAD, they need to be removed in a new commit before the history can be rewritten

Do not push the cleaned repository

Removing files and folders (will remove just base on name, regardless of path)

BFG is a tool that allows to rewrite history to remove files or folder based on name.

java -jar bfg-1.14.0.jar --delete-files file_name
java -jar bfg-1.14.0.jar --delete-folders folder_name

Removing specific files (specifying the path)

filer-branch command allows to remove specific files or folders from history.

git filter-branch -f --tree-filter 'rm -f path_to_file_or_folder' HEAD

AD-EYE_Core

Folders to remove:
  • mjpeg_server
  • web_video_server
  • robot_gui_bridge
  • GUI_server
  • experiments
  • Data
  • Prescan_models
Files to remove:
  • SSMPset_2018-1-3--11-58-48.csv
  • KTH_3D_KTH3d_20191008.org.dae
  • TemplatePexFile.pex

Pex_Data_Extraction:

Folders to remove:
Files to remove (those files were duplicated from pex2csv folder): (command : git filter-branch -f --tree-filter 'rm -f preproc.py' HEAD)
  • main.py
  • path.py
  • parse.py
  • preproc.py
  • road.py
  • staticalobject.py
  • utils.py
  • vmap.py

Finding what should be removed

Plotting using labours with resampling option can help know when the noisy commit happened.

The following script allows to find the biggest blobs in the history (source).

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

Using gitinspector

Install with sudo apt-get install gitsinspector.

gitinspector -l -r -m -T -f=",js,c,cpp,h,hpp,py,m" --format=htmlembedded > gitinspector_page.htm

Using gitstats

Insstall with sudo apt-get install gitstats.

gitstats git_directory output_directory