Meeting December 2019 (Programming) - UCL/TLOmodel GitHub Wiki


  • Lots of notes now on the wiki!

  • Follow the process outlined in 'implementing a disease module'

    • Have a basic test file to run your code
    • Develop the model incrementally, adding complexity gradually
      • Catch problems early
      • Easier for us to help!
  • Master should be merged into your branches soon after PRs are merged

    • Notification in the Slack #programming channel
    • Prevents complicated conflicts
    • Keeps your branch up-to-date
  • Open draft PRs on Github

    • Can use the collaboration tools but indicates work-in-progress
  • In Pycharm, set the working directory for to always be the root 'TLOmodel' directory

    • Paths can be relative to this

New utility functions and requests for me

  • tlo.util.transition_states

    • Takes a single Dataframe column of states and transition probability matrix (Dataframe)
    • Returns a new column with transitioned states
    • Example on the wiki
  • tlo.util.nested_to_record

    • A flattened dictionary representation of a Dataframe
    • e.g.
              Name   Region     Username
      1  Nathaniel  Midwest      nzburke
      2  Elisabeth    South     ewfoster
      3     Briana  Midwest  bclancaster
      4    Estella     West     elpotter
      5     Lamont    South      llwoods


     {'First Name_1': 'Nathaniel',
      'First Name_2': 'Elisabeth',
      'First Name_3': 'Briana',
      'First Name_4': 'Estella',
      'First Name_5': 'Lamont',
      'Region_1': 'Midwest',
      'Region_2': 'South',
      'Region_3': 'Midwest',
      'Region_4': 'West',
      'Region_5': 'South',
      'User Name_1': 'nzburke',
      'User Name_2': 'ewfoster',
      'User Name_3': 'bclancaster',
      'User Name_4': 'elpotter',
      'User Name_5': 'llwoods'}
    • Can be used for logging
  • In disease modules (class Xyz(Module)), self.load_parameters_from_dataframe loads parameters from resource dataframe, updating the class PARAMETERs

Updates to wiki and PR checklists

  • Installation and setup guide
    • Still need a Windows version!
  • Phase 4 & 5 from the checklist for developing a disease module
  • Pre-PR checklist
    • We're figuring out the tooling on Windows!

Issues outstanding

  • Improve logging using a TLO-specific logging module

    • Handles setting up the logging of TLO
    • One-liners to configure output
      • e.g. turn off, save to file etc.
    • Deal with strange output that causes problems downstream e.g. nan
    • Enforce documentation of log lines
      LOGGING = { 
      	'population_by_sex': LogLine('Population alive by sex'),
      	'cause_of_death': LogLine('Deaths in the last month grouped by caused') 
    • (TBH: A flag to indicate whether or not this output should be subject to scaling to match whole population size)
    • Includes improving the parsing of logs
      • Filtering the log lines when parsing
      • Using a faster implementation of building dataframes from log lines
  • Performance

    • Health System is the bottleneck
      • Continuing to profile and refactor
    • Over-allocating rows in the population dataframe
      • Essential that models only work on is_alive individuals!
    • The more frequent an event, the more to worry about the "work" in each call
    • We want to add a set of tools to easily profile blocks of code (using e.g. decorator)
  • More robust testing

    • CI to run tests on small and large population sizes
    • checks on the use of is_alive

Run Management system proposal

  • To ease configuring and running simulations, and processing of output

  • Prepare to run on compute clusters

  • A command-line tool to manage this: tlo

  • Have a collection of templates (or one that can be configured) that describe a "scenario"

    • e.g. tlo create-scenario my_test --template basic_scenario --some --other --options
    • would create a directory and write a scenario file therein
     # -------------------------------------------------------------
     # Name: my_test
     # Created: 10/12/2019 12:45
     # Template: basic_scenario
     # -------------------------------------------------------------
     import time
     import tlo.logging
     from tlo import Date, Simulation
     from tlo.methods.demography import Demography
     from tlo.methods.contraception import Contraception
     # -------------------------------------------------------------
     # Basic configuration
     # -------------------------------------------------------------
     start_date = Date(2010, 1, 1)
     end_date = Date(2051, 1, 1)
     initial_population_size = 100000
     resourcefilepath = './resources/'
     simulation = Simulation(start_date=start_date)
     # -------------------------------------------------------------
     # Register modules
     # -------------------------------------------------------------
     # Uncomment both import and register lines below
     # from tlo.methods.enhanced_lifestyle import Lifestyle
     # simulation.register(Lifestyle(resourcefilepath=resourcefilepath))
     from tlo.methods.depression import Depression
     # from tlo.methods.epilepsy import Epilepsy
     # simulation.register(Epilepsy(resourcefilepath=resourcefilepath))
     # -------------------------------------------------------------
     # Override parameters
     # -------------------------------------------------------------
             Demography: {
                 'fraction_of_births_male': 0.2
             Depression: {
     			'init_rp_ever_depr_per_year_older_f': 0.125,
     			'prob_3m_selfharm_depr': lambda rng: rng.rand(),
     			'rr_depr_on_antidepr': lambda rng: rng.exponential(0.1)
     # -------------------------------------------------------------
     # Run simulation
     # -------------------------------------------------------------
  • We then create a sample from our scenario

    • e.g. tlo create-sample my_test --some --other --options
    • Takes above scenario and samples value where necessary (placed in a sub-directory)
              Demography: {
                  'fraction_of_births_male': 0.2
              Depression: {
      			'init_rp_ever_depr_per_year_older_f': 0.125,
      			'prob_3m_selfharm_depr': 0.5187848579652606,
      			'rr_depr_on_antidepr': 0.05841701302920538
    • Can create several samples
      • tlo create-sample my_test --count 100 would create 100 samples of the scenario file
  • Finally we run the sample as many times as we would like

    • tlo run-sample my_test --all - runs all the samples
    • tlo run-sample my_test --sample 15 - runs a specific sample
    • tlo run-sample my_test --all --runs 1000 - run all the samples, each 1000 times
  • The resulting set of files might look something like this:

     ├── fixed_antidepr
     ├── my_test
     │   ├──
     │   ├── sample_001
     │   │   ├──
     │   │   ├── run_0001
     │   │   │   ├── output.csv
     │   │   │   ├── output.log
     │   │   │   └── output.pickle
     │   │   ├── run_0002
     │   │   ├── ...
     │   │   └── run_1000
     │   ├── sample_002
     │   ├── ...
     │   └── sample_100
     ├── random_selfharm
     └── some_scenario
  • Could also generate script required to submit jobs on a computer cluster

    • tlo run-sample my_test --sample 1 --runs 1000 --job-array
    • creates a shell script to submit job array to cluster