Technical background for coding - cogstat/cogstat GitHub Wiki

See additional details in the docstrings of the modules, functions, classes, and methods, and see the comments in the code.

Many part of the code needs refactoring, partly because, in former phases of the development, the technical and the conceptual/statistical solutions changed a lot.

Python language

  • CogStat is written in Python (and when needed, in R)
    • Python is used, because it is a free, high level, general language, that is becoming more popular among scientist and also its statistical modules are improving
    • When appropriate function is not available in Python, R can be used (at this point, all previous R parts have been removed, but in future versions we're ready to include them again if a required procedure is not available in Python, and if we don't intend to implement it)
  • We (mostly) use PEP 8 coding style.

Main Python packages used by CogStat

  • pandas - handling the data
  • statsmodels - for most of the statistical calculations
  • numpy, scipy.stats, pingouin - for some other statistical calculations
  • matplotlib - for most of the graphs
  • PyQT (qt5) - for the GUI

Main modules of CogStat

  • Core analysis functions
    • cogstat.py - main module, handles the data and compiles the relevant results; it calls cogstat_*.py core submodules to compile the results
    • cogstat_stat.py - creates result strings of the statistical analysis
    • cogstat_chart.py - creates graphs
    • cogstat_hyp_test.py - creates strings for the hypothesis tests and the power analyses
    • cogstat_stat_num.py - statistical calculations (as functions) that are not available in other Python modules
  • GUI
    • cogstat_gui.py - the GUI, except the dialogs, it calls cogstat.py methods
    • cogstat_dialogs.py - dialog handling for the GUI
      • ui folder - specific dialogs in Qt5Designer .ui files and the corresponding .py files
  • Other
    • cogstat_config.py - various settings
    • cogstat_util.py - various functions used in several modules
    • test/test_stat.py - tests for the statistical functions

R language

From version 2.5, CS code can use R functions as well to run the analyses.

  • Use R functions only when a Python solution is not available.
  • When a new Python solution is made, R version can be removed.
  • The analyses should check if R is available (csc.versions).
  • Required R components are listed in setup.py and requirements.txt as comments
  • When several R solutions are available for the same task, use the one that (1) works correctly and possibly maintained, (2) requires minimal additional installation size (considering the dependencies and the packages that are already available for other analyses), (3) uses simpler code. (This order is also the priority order of the viewpoints.)
  • R should do only the core calculations. Data preparation and output formatting should be done in Python.

Adding new features

When adding new features it should include:

  • Implementing the new feature in the core analysis part
  • GUI for the feature if relevant
  • Unit tests for the feature
    • Validation of the results, see the test/test_stat.py docstring
  • Documentation for the wiki and for the Jupyter notebook

Docstrings

  • We mostly (will) use NumPy style docstrings (see also the pandas guide)
  • API documentation is built with pdoc