Considerations for the core functions - cogstat/cogstat GitHub Wiki

Main analysis pipelines

The analysis pipelines are methods for the CogStatData objects in the cogstat.py module.

Preconditions of the analyses should be placed here. Check these preconditions at the beginning of the method. If the preconditions are not met, then return a localizable message to the user as a regular html text, and do not raise an exception. (But exceptions can be raised in other functions.)

Returned results

The returned result is a list of items, that can be displayed as an html document:

html strings
- The html document should be formatted with html tags and css. #72
- There are CogStat-specific "tags" starting with cs_ (e.g., <cs_h1>). These tags are translated to html tags. See the list of those tags in the cogstat_config.py module.
- Do not use \n for formatting, unless text should be separated within a block.
- Note that the qt based GUI may not display all html tags or formatting.
images
- In the GUI version, they will be converted to png/svg images.
pandas dataframes
- In the GUI version, they will be converted to html.
- Note that the qt based GUI may not display all html tags or formatting.

Variable names

(This is about the variable names in the data the user uses, not about the variable names in the code.)

(This is the aim, but not the current case.) Variable names should be the strings one wants to see in the results. This might include non-letter characters, non-Latin letters, practically anything.
- To make sure that the analyses run appropriately in the background, safe variable names should be used in the analyses, but the output should show only the names the user specified in the data file.
If the same names are used for several variables, CogStat should handle this.
If the name is missing for a variable, CogStat should handle this.

Data import

CogStat can import data from various file formats. This is important for (a) users who use CogStat occasionally, and (b) users who want to use tutorial datasets.

Three aspects of the data should be imported: (1) variable names, (2) measurement levels, and (3) the values.

Variable names
- When there are two (a short and a long) versions of a name (such as names and labels in SPSS), the longer one should be preferred. CogStat tries to use variable names that are intended for the output, and these names shouldn't have technical constraints.
Measurement levels
- Some file formats include this information (e.g., modified spreadsheet files and csv files, SPSS). These are the preferred file formats for working in CogStat.
- Some file formats do not include this information (e.g., STATA .dta). In this case, all variables are unknown (and string variables are nominal).
- Some file formats have limited information about the measurement levels (e.g., R ordered factor is an ordinal variable). In those cases, this partial information should be imported.
Values
- Some data types are not supported by CogStat, and those data types should be converted. For example, logical values and dates can be converted to strings.
- When the file format supports value labels, in nominal variables, the label should be stored to support readable output. At the moment, CogStat does not support non-numerical ordinal or interval variables. In the latter cases, numerical values could be imported.

See data import related issues here.