CsvToDat Utility - openmpp/openmpp.github.io GitHub Wiki

Home > CsvToDat Utility

This topic documents the experimental OpenM++ CsvToDat utility. CsvToDat creates Modgen-compatible .dat input files for parameters provided in csv or tsv format.

Topic contents

Introduction

The CsvToDat utility is Modgen-specific. It is only relevant for cross-compatible models, and only relevant for cross-compatible models where one or more parameters are supplied in csv or tsv format, formats which Modgen does not support. Modgen only supports the Modgen-specific .dat format.

OpenM++ supports a number of parameter input formats described elsewhere in this wiki. For example, values for the RiskPaths parameter UnionDurationBaseline can be specified in several csv formats such as a file named UnionDurationBaseline.csv with content like

dim0,      dim1,     param_value
UO_FIRST,  "(-∞,1)", 0.0096017
UO_FIRST,  "[1,3)",  0.0199994
UO_FIRST,  "[3,5)",  0.0199994
UO_FIRST,  "[5,9)",  0.0213172
UO_FIRST,  "[9,13)", 0.0150836
UO_FIRST,  "[13,∞)", 0.0110791
UO_SECOND, "(-∞,1)", 0.0370541
UO_SECOND, "[1,3)",  0.0370541
UO_SECOND, "[3,5)",  0.012775
UO_SECOND, "[5,9)",  0.012775
UO_SECOND, "[9,13)", 0.0661157
UO_SECOND, "[13,∞)", 0.0661157

For a Modgen model, the parameter UnionDurationBaseline must be specified in .dat Modgen format, either in a stand-alone file with a name like Param_UnionDurationBaseline.dat or perhaps together with other parameters in a file with a name like RiskPaths.dat. In either case, the content looks something like:

parameters {
	 // Union Duration Baseline of Dissolution
	double	UnionDurationBaseline[UNION_ORDER][UNION_DURATION] = {
		0.0096017, (2) 0.0199994, 0.0213172, 0.0150836, 0.0110791, 
		(2) 0.0370541, (2) 0.012775, (2) 0.0661157, 
	};
};

The CsvToDat utility can be used to create, for each parameter supplied in csv or tsv format, a corresponding dat file for the Modgen version of a model. OpenM++ ignores input files with prefix modgen_, so CsvToDat deliberately generates .dat files with that prefix. That ensures that OpenM++ will not attempt to read two versions of the same parameter from two input files, which would result in a build error. Continuing the example, CsvToDat would create a file named modgen_UnionDurationBaseline.dat. CsvToDat will silently overwrite any existing version of a generated file named modgen_UnionDurationBaseline.dat.

CsvToDat does not actually read and convert the csv file UnionDurationBaseline.csv. Instead, CsvToDat notes that UnionDurationBaseline was specified in csv format by the presence of that file. It then obtains the values of the parameter and its metadata (dimension names) from the published OpenM++ database for the model to construct the contents of the generated file modgen_UnionDurationBaseline.dat (see above for what the generated contents look like). Thus, it is important to build the OpenM++ version of the model to create an up-to-date OpenM++ database for the model before running csv_to_dat.

If a parameter value note is supplied for a parameter provided in .csv or .tsv format, for example a file UnionDurationBaseline.EN.md which provides an English note for the parameter values in the file UnionDurationBaseline.csv, CsvToDat will read the .md file and create a corresponding NOTE comment in the generated file modgen_UnionDurationBaseline.dat. Lists and code blocks in the markdown file will be transformed to their Modgen equivalents in the generated NOTE comment.

CavToDat will silently replace an existing file like modgen_XYZ.dat if the file XYZ.csv exists. It will not remove an existing file modgen_ABC.dat if the corresponding file ABC.csv is absent. An 'orphan' file like modgen_ABC.dat may have been created by a previous invocation of CsvToDat when the model contained the parameter ABC which was subsequently removed or renamed. Orphan files like modgen_ABC.dat may need to be removed manually to avoid errors when the Modgen model is run and attempts to read the parameter ABC which no longer exists in the model.

[back to topic contents]

Arguments and options

This subtopic describes the command line options of the CsvToDat utility. CsvToDat is experimental, and options and behaviour may change in subsequent releases.

A complete list of options and arguments is displayed by issuing the command

perl CsvToDat.pl -h

CsvToDat [-hkv] [long options...]
        -h --help               print usage message and exit
        -v --version            print version and exit
        --model STR             name of model
        --ompp_database STR     path of ompp database containing scenario
        --scenario_folder STR   path of scenario folder for generated dat
                                files
        --scenario STR          name of scenario (default is Default)
        -k --keep_tmp           keep temporary files in folder ./tmp_CsvToDat
        --verbose               verbose log output

The arguments --model, --ompp_database, and --scenario_folder are required. --model is a name which is usually the same as the model folder, and the other two arguments are paths. See below for an example. The -k argument is optional and does not affect the operation of CsvToDat. Without this flag, CsvToDat tells the dbcopy utility to write metadata and parameters to a temporary folder which is deleted when CsvToDat completes. This flag will instead place dbcopy output to a fixed subfolder named ./tmp_CsvToDat where it can be examined.

The argument --scenario is optional. The default value is the Default scenario. This option is untested.

[back to topic contents]

Example

The CsvToDat utility is supplied as the Perl script CsvToDat.pl as well as the stand-alone executable CsvToDat.exe for the convenience of Windows users who may not have Perl installed. The two versions function identically. In this example the Perl script version is used.

Consider a model named RiskPaths_csv which is a clone of the RiskPaths model, but with some input parameters specified in a csv format instead of the .dat format.

A command prompt is opened, and the current working directory set to the model folder RiskPaths_csv.

The Default scenario folder RiskPaths_csv/parameters/Default contains the following files:

C:\Development\X\ompp\models\RiskPaths_csv>dir parameters\Default
 Volume in drive C is OS
 Volume Serial Number is 14E2-D15F

 Directory of C:\Development\X\ompp\models\RiskPaths_csv\parameters\Default

2021-06-14  09:06 PM    <DIR>          .
2021-06-14  09:06 PM    <DIR>          ..
2021-06-12  04:12 AM               179 AgeBaselineForm1.id.csv
2021-06-12  04:12 AM                88 AgeBaselinePreg1.value.csv
2021-06-12  04:12 AM                22 CanDie.csv
2021-06-12  04:12 AM                82 Framework.odat
2021-06-14  10:24 AM               484 RiskPaths.dat
2021-06-12  04:12 AM               426 UnionDurationBaseline.csv
               6 File(s)          2,211 bytes
               2 Dir(s)  1,729,317,085,184 bytes free

As can be seen from the file names and extensions, the 4 RiskPaths parameters AgeBaselineForm1, AgeBaselinePreg1, CanDie, and UnionDurationBaseline are specified using a csv format. The remaining RiskPaths parameters are specified in .dat format in the files RiskPaths.dat and Framework.odat.

If the Modgen version of the model were run, it would fail with an error about missing values for those 4 parameters.

Next, CsvToDat is invoked to create Modgen .dat versions of the missing parameters, as follows:

C:\Development\X\ompp\models\RiskPaths_csv>perl ../../Perl/CsvToDat.pl --model RiskPaths_csv --ompp_database ompp/bin/RiskPaths_csv.sqlite --scenario_folder parameters/Default

CsvToDat was invoked specifying the three required arguments. In this invocation, the paths to the OpenM++ database and to the scenario folder were specified relative to the current working directory, which was set to the model folder RiskPaths_csv previously. After CsvToDat completes, the contents of the scenario directory have changed, as follows:

C:\Development\X\ompp\models\RiskPaths_csv>dir parameters\Default
 Volume in drive C is OS
 Volume Serial Number is 14E2-D15F

 Directory of C:\Development\X\ompp\models\RiskPaths_csv\parameters\Default

2021-06-14  09:06 PM    <DIR>          .
2021-06-14  09:06 PM    <DIR>          ..
2021-06-12  04:12 AM               179 AgeBaselineForm1.id.csv
2021-06-12  04:12 AM                88 AgeBaselinePreg1.value.csv
2021-06-12  04:12 AM                22 CanDie.csv
2021-06-12  04:12 AM                82 Framework.odat
2021-06-17  12:08 PM               285 modgen_AgeBaselineForm1.dat
2021-06-17  12:08 PM               267 modgen_AgeBaselinePreg1.dat
2021-06-17  12:08 PM                47 modgen_CanDie.dat
2021-06-17  12:08 PM               331 modgen_UnionDurationBaseline.dat
2021-06-14  10:24 AM               484 RiskPaths.dat
2021-06-12  04:12 AM               426 UnionDurationBaseline.csv
              10 File(s)          2,211 bytes
               2 Dir(s)  1,729,318,125,568 bytes free

CsvToDat has created 4 new files with the .dat extension, one for each parameter which was specified in csv format. The Modgen version of RiskPaths_csv can now read these files and build and run without error. The OpenM++ version of RiskPaths_csv will ignore these 4 new files because their names start with modgen_, and will continue to build and run without error.

[back to topic contents]

⚠️ **GitHub.com Fallback** ⚠️