Technical background for coding - cogstat/cogstat GitHub Wiki

See additional details in the docstrings of the modules, functions, classes, and methods, and see the comments in the code.

Many part of the code needs refactoring, partly because, in former phases of the development, the technical and the conceptual/statistical solutions changed a lot.

Python language

CogStat is written in Python (and when needed, in R)
- Python is used, because it is a free, high level, general language, that is becoming more popular among scientist and also its statistical modules are improving
- When appropriate function is not available in Python, R can be used (at this point, all previous R parts have been removed, but in future versions we're ready to include them again if a required procedure is not available in Python, and if we don't intend to implement it)
We (mostly) use PEP 8 coding style.

Main Python packages used by CogStat

pandas - handling the data
statsmodels - for most of the statistical calculations
numpy, scipy.stats, pingouin - for some other statistical calculations
matplotlib - for most of the graphs
PyQT (qt5) - for the GUI

Main modules of CogStat

Core analysis functions
- cogstat.py - main module, handles the data and compiles the relevant results; it calls cogstat_*.py core submodules to compile the results
- cogstat_stat.py - creates result strings of the statistical analysis
- cogstat_chart.py - creates graphs
- cogstat_hyp_test.py - creates strings for the hypothesis tests and the power analyses
- cogstat_stat_num.py - statistical calculations (as functions) that are not available in other Python modules
GUI
- cogstat_gui.py - the GUI, except the dialogs, it calls cogstat.py methods
- cogstat_dialogs.py - dialog handling for the GUI
  - ui folder - specific dialogs in Qt5Designer .ui files and the corresponding .py files
Other
- cogstat_config.py - various settings
- cogstat_util.py - various functions used in several modules
- test/test_stat.py - tests for the statistical functions

R language

From version 2.5, CS code can use R functions as well to run the analyses.

Use R functions only when a Python solution is not available.
When a new Python solution is made, R version can be removed.
The analyses should check if R is available (csc.versions).
Required R components are listed in setup.py and requirements.txt as comments
When several R solutions are available for the same task, use the one that (1) works correctly and possibly maintained, (2) requires minimal additional installation size (considering the dependencies and the packages that are already available for other analyses), (3) uses simpler code. (This order is also the priority order of the viewpoints.)
R should do only the core calculations. Data preparation and output formatting should be done in Python.

Adding new features

When adding new features it should include:

Implementing the new feature in the core analysis part
GUI for the feature if relevant
Unit tests for the feature
- Validation of the results, see the test/test_stat.py docstring
Documentation for the wiki and for the Jupyter notebook

Docstrings

We mostly (will) use NumPy style docstrings (see also the pandas guide)
API documentation is built with pdoc