2012 December Design - openmpp/openmpp.github.io GitHub Wiki

About this document

This roadmap and architecture document presented from "model developer" point of view, which imply C++ development process, user aspects of OpenM++ are deliberately excluded. Please refer to OpenM++ user guide pages for additional details.

What is OpenM++

OpenM++ is an open source implementation of the Modgen microsimulation tool created at Statistics Canada. It is not a copy of the Modgen, but a new, functionally equal implementation of publically available Modgen specifications. OpenM++ also has its own important distinct features like portability, scalability and open source, which Modgen does not. Extensive information on Modgen is available on the Statistics Canada web site at http://www.statcan.gc.ca/microsimulation/modgen/modgen-eng.htm.

OpenM++ Design Basics

Common OpenM++ design principles:

portability: it must work on Windows and Linux, 32 and 64 bit versions
scalability: work on single PC, in cluster or in cloud environment
open source: it is open source product

OpenM++ is portable and scalable:

OpenM++ designed, developed and tested to work on Windows and Linux, in 32 and 64 bits. As result same model can be created and tested on model developer Windows PC and later run on Linux (or Windows) HPC cluster with thousands CPUs.

OpenM++ models are essentially highly parallelizable computational applications and fits very well in HPC cluster environment.

Specific version of cluster environment is to be established during first development phase. However, for the purpose of this design document, we can make a safe assumption cluster environment mean MPI cluster since many of existing HPC clusters, including CopmuteCanada cluster, are MPI-based.

OpenM++ is web-ready and cloud-ready:

It is important to understand, OpenM++ is targeted to provide “software-as-a-service” cloud models for research community. To simplify this roadmap cloud and web details of OpenM++ omitted here. However, OpenM++ cloud capabilities are essential and all control layer, algorithms and data layer components must be designed, developed and tested as cloud-ready.

OpenM++ Architecture

OpenM++ consists of 3 software layers:

layer 1: presentation
layer 2: control
layer 3: algorithms and data and must accommodate to 3 model life-cycle stages:
model design and development stage
model run stage
modeling results analysis stage

Note: Components described below in the order of OpenM++ layers and not in the order of development. For each component priority of the features specified as (pri1), (pri2) and (pri3); that value does NOT correspond to OpenM++ development phases.

Layer 1: OpenM++ presentation layer

Component 1.1: OpenM++ IDE

OpenM++ IDE is desktop GUI application to:

(pri1) edit model parameters
(pri1) view model output results
(pri1) compare parameters of two models (see note below)
(pri2) edit model source file(s) with (p3) syntax highlighting for OpenM++ language (.ompp)
(pri2) compile from .ompp into c++ by invoking OpenM++ compiler, capture error s and warnings
(pri2) compile and debug c++ model code by using GCC or Microsoft c++
(pri2) debug c++ model executable
(pri2) run model on single PC or (p3) submit it to HPC cluster
(pri3) support source control system(s) integration (svn and/or git)
(pri2) provide unit testing functionality

Note1: As an alternative OpenM++ GUI can be split into multiple independent applications with desktop or web UI. In any case it must provide and parameter editing capabilities.

Note2: Model comparison initially implemented as simple tool to compare parameters of two models. It can be later extended to support output results comparison with sophisticated analysis, however, most likely it going to be done as part of described below OpenM++ model analysis tools and OpenM++ web solutions.

Component 1.2: OpenM++ output result viewers and model analysis tools

OpenM++ presentation layer should be extendable and must support development of 3rd-party tools to view and analyze model output results. Following viewers to be implemented first:

(pri1) Excel workbook and /or sample module(s)
(pri2) import/export into R
(pri2) basic web UI sample pages for ASP.NET
(pri3) basic web UI sample pages for PHP
(pri3) basic web UI sample pages for Java
(pri3) Excel OpenM++ add-on

Basic web UI sample pages with necessary server-side components provided as reference point for web development and should allow view/edit parameters, view output results and run model executable.

Component 1.3: OpenM++ cloud and web capabilities

OpenM++ must be cloud-ready and support “software-as-a-service” usage of the model(s). These capabilities are out of current document scope. Mentioned above OpenM++ basic web UI sample pages provide starting point for web-developers. As next step web-solutions to use OpenM++ models on the web are going to be developed:

(pri1) OpenM++ ASP.NET web solution (comparable to ModgenWeb)
(pri2) OpenM++ PHP web solution
(pri3) OpenM++ Java web solution

Those web-solutions (as every other portion of OpenM++) must be scalable and portable and ready to be deployed in private or public cloud platform, for example, Microsoft Azure for OpenM++ ASP.NET web solution (specific cloud platforms to be established). Based on that OpenM++ cloud software service capabilities can be created to provide ability for researches to work with their own models, including collaborative modeling, by using thin clients (i.e. web-browsers).

Note: Full C++ model development cycle is not supported by web solutions, however it may be considered as OpneM++ cloud the feature.

Layer 2: OpenM++ controller layer

That layer should provide set of command-line utilities, scripts and components to:

compile, debug and run OpenM++ models on single PC or in cluster environment
import, export and convert model data

Component 2.1: OpenM++ compiler

(pri1) The OpenM++ compiler produces C++ code from .ompp source code for a specific model. The .ompp source code is written by a model developer and contains declarative and procedural elements. Declarative elements include types, parameters, agents, variables, inter-agent links, agent collections, events, and cross-tabulations. Procedural elements include code to implement events and (optionally) to prepare secondary inputs and outputs. The OpenM++ compiler also produces a database of meta information on the model which is used by other OpenM++ components.

Component 2.2: OpenM++ controller for MPI cluster

OpenM++ models should run in not only on single PC but also in HPC cluster. OpenM++ cluster controller is a command-line utility, script or set of scripts to support most commonly used HPC cluster environments. For the purpose of this document MPI-compatible environment is assumed, however, other options can be considered as well. Following steps required in order to implement this:

(pri1) organize test OpenMPI or MPICH2 cluster for CentOS 64bit
(pri1) establish development environment for Windows 32bit and 64bit
(pri1) create OpenM++ controller(s) for each cluster environment
(pri2) establish automated test procedures for OpenM++ models in cluster
(pri3) organize test OpenMPI or MPICH2 cluster for Debian or Ubuntu 64bit
(pri3) organize test MS HPC cluster for Windows 64bit

Component 2.3: Modgen compatibility convertors

These are a command-line utilities to convert existing Modgen models into OpenM++ format:

(pri1) parameters .dat file(s)
(pri2) source model code .mpp file(s)

Component 2.4: OpenM++ SQL loaders

This is a command-line utility(s) to load data from OpenM++ model data storage into well-known SQL Server databases:

(pri1) loader for MS SQL Server
(pri1) loader for MySQL / MariaDB
(pri2) generic SQL99 loader
(pri3) loader for Oracle
(pri3) loader for PostgreSQL
(pri3) loader for IBM DB2
(pri3) loader for Apache Derby, H2 or HSQL
(pri3) loader for LucidDB, InfiniDB or MonetDB

Note: As an alternative solution all or some above functionality can be moved into OpenM++ data library. It also possible to supply few different versions of OpenM++ data library targeted to the different SQL Server database.

Component 2.5: OpenM++ output convertors

This is a command-line utility(s) to convert from OpenM++ model data storage into well-known formats:

(pri1) .csv convertor for parameters and output results
(pri2) .xml convertor for model data or user-defined subset of model data
(pri3) SDMX convertor for model data
(pri3) convertor into Statistics Canada Biobrowser database

Layer 3: OpenM++ algorithms and data layer

This layer consists of OpenM++ common libraries and model data storage (model database).

Component 3.1: OpenM++ modeling library

The modeling library provides core functionality for the model life cycle, including agent creation / destruction, event queue management, on-the-fly cross-tabulation, and pre- and post-simulation processing. It may use OpenM++ data and execute libraries to organize model execution and result aggregation (especially in cluster environment), read model parameters, save model tracks and aggregate cross-tabulation results.

Component 3.2: OpenM++ model data storage (model database)

OpenM++ data storage design should provide an ability to store model parameters and output results inside of SQL database and support model tracking functionality, which may be done through a different database, text or XML file (subject for research during phase 1). OpenM++ data storage can be implemented in following ways:

(pri1) inside of single embedded (file-based) SQL database
(pri2) as above plus extra database for model tracking
(pri3) model parameters and metadata inside of file-based SQL database and output results as .csv files
(pri3) inside of SQL server database chosen by model developer (i.e. MSSQL, Oracle, etc.)

In any case model data storage should support basic OpenM++ design principles:

portability between Linux, Windows, 64 and 32bit OS’s
scalability from single PC up to HPC cluster environment

Component 3.3: OpenM++ data library

Data library(s) is a C++ library to support model data read/write operations and hide low-level implementation details to simplify model code and modeling library. As (priority 1) it should support single embedded (file-based) SQL database in portable way. However, in a future (priority 3) it can consist of different implementations of data libraries for different target model storage (for example, to directly write into Oracle).

(priority 2) Second part of OpenM++ data libraries should provide an access to model data from Java and .NET to allow develop model analyzing tools and OpenM++ web solutions.

Component 3.4: OpenM++ execution library

(pri1) Execution is relatively thin C++ layer to simplify modeling library scalable coding, or other words, to avoid low-level details inside of modeling library for handling the difference between single PC and cluster execution. Depending on design decisions and target cluster environment it may not be used directly from modeling library but rather called from OpenM++ cluster controllers (see 2.2). In any case it should:

(pri1) provide necessary information for model initialization (i.e. number of CPUs)
(pri1) synchronize parallel model execution (i.e. wait for completion)
(pri2) support data exchange between models or model and controller (i.e. progress report)
(pri2) simplify tracking data exchange
(pri1) organize transparent communication for output result aggregation

For the purpose of this document MPI cluster environment assumed, however other options can be considered as well.

(pri1) It is important to understand the modeling library may be designed in “single-threaded” way and then execution library must organize additional thread(s) for the purpose of model cluster communication, progress reporting, tracking, etc. Multithreading must be done in portable way and following solution should be considered for research during phase 1 of development:

STL and C++11 standard features for threading and synchronization (i.e.: future)
glib
boost::thread and synchronization libraries
APR (Apache portable runtime)
OpenMP

Component 3.5: OpenM++ presentation library(s)

(pri1, pri2, pri3) Presentation libraries together with data library allow developing applications to view and analyze OpenM++ model output results. Priority and functionality of presentation libraries development completely defined by priority of OpenM++ viewers, and OpenM++ web solutions, described in 1.2 and 1.3 above. As (pri1) priority .NET presentation library(s) for Excel viewer and ASP.NET basic UI should be implemented.