M LED - iiasa/RE4AFAGRI_platform GitHub Wiki

Introduction

M-LED is an electricity demand assessment platform covering all main demand sectors relevant to electricity system planning, modelling these in high spatial resolution and with future projections, while also targeting communities where currently electricity supply infrastructure is lacking.

Originally introduced in Falchetta et al. (2021) and written in the R scientific computing programming language, M-LED exploits many geospatial data sources and (sub)-national statistics to produce monthly estimates of electricity demand in a set of sectors across a country. The key benefit of the bottom-up methodology is that the output data can both be used at the native local level of analysis - i.e. communities and settlements (also called population clusters) - and be aggregated to produce sub-national or national estimates of trends in electricity demand.

M-LED is designed to operate at the country-level - calibrating current electricity consumption levels with recent national statistics and downscaling them at the local population cluster level.

As a bottom-up assessment platform, M-LED performs calculations at the most granular level allowed for by the input data. M-LED demand projections are both drivers-driven, i.e. determined e.g. by the projected growth in population and economic affluence levels, and objective (or policy) driven, i.e. aimed at ensuring that sufficient electric energy to cover needs such as powering pumps for irrigation, crop processing machinery, or appliances in schools and healthcare facilities is provided.

framework_mled

M-LED estimates electricity demand at the population settlement (namely, community) level. Population settlements are polygons enclosing a settlement such as a city or a village. There are different approaches to generate population clusters for a country, while a set of different data products of this type are also available, such as from GRID3 or from Khavari et al. (2021). Clusters include residential buildings where populations are living, as well as other sectors such as SMEs and healthcare and educational facilities. Clusters are then linked to the surrounding agricultural land (mostly not inhabited land devoted to agriculture) through a Voronoi polygonisation of land area approach based on each population cluster centroid). This approach allows linking demand for agriculture from cropland surrounding population clusters to the estimated demand in population clusters. A maximum distance parameter is available in M-LED (with default value of 5 km) beyond which the load for irrigation is classified as “off-grid demand”, i.e. demand that is unlikely to be served from the main electricity supply system(s) serving the population cluster community.

Also, for crop processing a similar approach is considered, where travel-time based catchment areas around cities (i.e. crop markets) are calculated thus generating polygons that describe the area located within a user-defined number of minutes of travel time (considering the fastest route and mean; see Weiss et al. 2019 and 2017 for the underlying approach) from the city. In particular, a threshold of 180 minutes (3 hours) is considered. Clusters whose centroid is falling outside of those urban catchment areas are not deemed suitable to have a crop processing electricity load, as their distance to market makes it likely economically unprofitable to perform crop processing / storage at those locations.

With regards to the time characterisation of M-LED, the model operates at 10-year timesteps starting from the base year (2020) at which the current electricity demand is calibrated, and reaching the target year (2060) by recursively projecting demand across time-steps. For every sector, demand is estimated both at the yearly and at the monthly level, to incorporate the role of seasonality and inter-annual variations. Irrespective of M-LED outputs being harmonised to the monthly and yearly scales, electricity demand in certain sectors is first defined at higher temporal resolution (e.g. water pumping or residential, healthcare and school demand are calculated bottom-up, starting from the hourly profiles of utilisation) and then aggregated to the monthly scale, but also at lower temporal resolutions, such as in the case of mining demand, which is then equally redistributed among months of the year.

Model structure

M-LED has a modular structure, disaggregated into four main types of modules:

Backend modules (libraries, working directories, and technical parameters definition)
Scenario module (specific to the country in question, also containing the specifics of the scenarios that the user wants to run)
Modelling modules (the actual code performing data and model operations to produce electricity demand estimates)
Output writing and reporting modules (writing output data and summary csvs and figures)

framework_mled_2

M-LED.R is the wrapper of M-LED, where the basic technical and run-specific parameters are user inputted, the scenarios are defined, and the actual model modules are launched for the desired scenarios.

Here, in line 8 the working directory (i.e. the folder path containing the cloned Github repository of M-LED (e.g. M-LED.R) should be manually inputted.

Additional manual parameters include the db_folder parameter, specifying the path where the M-LED database was downloaded from Zenodo or where it should be automatically downloaded if the download_data parameter is set to TRUE.

backend.R is the module responsible for installing and loading the required libraries to run M-LED, as well as to download and/or collect all the input data for the specified country run. The user is not required to intervene on this file unless specific technical tweaking is desired. Useful tweaks include editing the cl parameter, which defines the number or percentage of CPU cores to use for parallel processing. A higher share will decrease the model run time but if set too high it might lead to memory issues and M-LED to fail, thus a safe option is to maintain the default option of 25% of CPU cores. Note that this is only relevant if the if the allowparallel parameter in the [M-LED.R] file is set to TRUE.
calculate_rates_for_baseline.R is the module responsible for calculating what are the projected future trends in the electricity access rate and power irrigated cropland share if the baseline trend is followed, i.e. if no policy target is defined. These trends are based on extrapolation of historical trends from the ESMAP Tracking SDG7 database and the FAO's AQUASTAT database for electricity access and irrigation share, respectively.
scenario_.R are the main files where the user is encouraged to manually intervene. At this location, all the input data subsequently called by the different M-LED modules are called; country-specific parameters for model calibration and affecting demand estimation are defined, such as national electricity demand in different sectors; technological parameters are defined; and, country-specific input data are read. The current implementation of M-LED includes scenario files for five countries:
- Zambia
- Rwanda
- Zimbabwe
- Nigeria
- Kenya

A detailed description of the input data and the corresponding line of the [scenario_.R] where such data are sourced is found at the section relative to Input Data of this Wiki.

Note that the scenario module is itself wrapping three additional sub-modules, all fully automated and thus requiring no user intervention:

The projector.R script, which project the key macro drivers (population, GDP and GDP per capita, urbanisation) within each population cluster using SSP-consistent downscaled datasets and the inferred growth rates (calculated at the GADM2 administrative level boundaries).
The urbanisation_calibration.R script which calibrates clusters urban status to national urbanisation rate and their future evolution over SSP urbanisation projections up to the target year 2060.
The rwi_to_gdp_capita.R script which converts the Relative Wealth Index information to an Absolute Wealth Estimate following procedures described in Hruschka et al. (2015), which in turn is used to calibrate the current GDP per capita at the cluster level.

The demand_growth_weights.R module calibrates the pace at which policy objectives that are imposed in each scenario for a given target year (e.g., universal electricity access; % of irrigated cropland; % of crop yield locally processed) are reached. The evolution of demand towards these policy goals is thus non-linear and depends on the evolution of per-capita GDP in the socio-economic scenario (SSP) of reference.
The electricity_access.R module estimates the current share of people with or without electricity in each local population cluster using satellite-based nighttime light information (following the methods described in Falchetta et al. 2019).
The create_clusters_voronoi.R module creates Voronoi polygons around the centroid of each input population cluster polygon. These polygons are used to perform cropland-related spatial data extraction operations, under the assumption that cropland from which electricity demand is driven (for water pumping and irrigation and consequent crop processing) at each cluster is the cropland that is closest to each cluster centroid than any other cluster centroid. The maximum distance of cropland to be considered as potential driver of electricity demand associated with population clusters is capped by the ``m_radius_buffer_cropland_distance` parameter. Cropland beyond this distance is classified as eligible for standalone water pumping and is estimated as a separate sector.
The crop_module.R module calculates the monthly and total crop-specific irrigation needs based on the WaterCrop estimates entering M-LED as input data and on the MapSPAM 2017 SSA rainfed and irrigated cropland areas and yield data, as well as constraints such as environmental flows preservation.
The pumping_module.R module then estimates the monthly power and energy requirements to pump water from either surface water bodies and rivers or from groundwater aquifers net of the crop-specific irrigation technology efficiency as well as technological assumptions and parameters of the pumping model. The module then selects whether at each cluster it is optimal to develop surface or groundwater irrigation.
The crop_module_solar_pumps.R module is a twin file of the crop_module.R, except it quantifies irrigation water demand from cropland that is more distant from each population cluster centroid than the distance determined by the m_radius_buffer_cropland_distance
The pumping_module_solar_pump.R module is also a twin file of the pumping_module.R, quantifying the monthly power and energy requirements to pump water for irrigating cropland that is more distant from each population cluster centroid than the distance determined by the m_radius_buffer_cropland_distance.
The mining_module.R module downscales the national industry demand from the mining sector (calibrated exogenously through the industry_final_demand_tot parameter) based on mining sites data and nighttime light radiance. It then matches mining sites to the closest population clusters. It projects demand growth in the sector proportional to GDP per capita growth rates.
The residential.R module estimates household electricity demand from both:
- People already benefiting from electricity access in the base year (2020) or at each future timestep, for whom future electricity consumption grows proportionally to locally projected GDP per capita growth rates mediated by an income elasticity of electricity demand coefficient that is itself proportional to the GDP per capita level.
- People who will gain electricity access at a given timestep, whom are first allocated to a given tier of electricity access, estimated through a statistical model based on current estimated tiers of electricity access among populations with electricity access, and then are attributed a given demand based on the tier-specific, urban-rural differentiated appliances-basked generated load profiles from the RAMP model outputs (Lombardi et al. 2019). See Falchetta et al. (2021) for a more detailed description of the RAMP model linkage into M-LED and the definition and calculation of appliance baskets and utilisation patterns in the context of electricity demand estimation.
The health_education_module.R module estimates electricity needs for achieving given appliance use standards in currently existing healthcare and educational facilities. Proportional to the number of beds (for healthcare facilities) or to the number of pupils (for schools), facilities are mapped to a given tier/size. Then, similarly to the residential sector, facilities are linked to a given demand based on tier-specific differentiated appliances-basked generated load profiles from the RAMP model outputs. In addition, facilities are densified in the future based on local population growth rates across time steps and critical population thresholds for facilities uptake or densification.
The crop_processing_catchment_areas.R module creates polygons surrounding each urban centre in the country under analysis based on the ``minutes_cluster` parameter. This parameter defines the area within the given number of minutes of road travel time for each city, and it considers all the crop processing demand occurring within this area as feasible, while nulling that outside of those areas. In other words, it forces crop processing electricity demand to be estimated in areas that are sufficiently close to markets where products can be sold.
The crop_processing.R module estimates electricity demand for crop processing and vegetables cold storage. Energy needs are depending on the crop processing energy needs database csv file, which then are multiplied by current and future potential crop yields in the corresponding crop-specific harvest period(s). Crop processing demand is then constrained to only occur in clusters within the urban markets crop processing catchment areas discussed above. A schematic representation of the local machinery requirements (crop-specific) is also provided.
The other_productive.R module estimates non-farm SMEs electricity demand based on the estimated residential electricity demand at each cluster and a markup range (range_smes_markup parameter) varying based on the roads density and distance to nearest city of each cluster.
The calculate_yield_growth_potential.R module calculates the potential growth in the yield thanks to the input of irrigation at each timestep.
The cleaner.R module is responsible for rearranging the results column names, order, and structure, as well as for calculating the other sector, which in the base year 2020 is defined as the difference between the reported national electricity demand from the IEA and the sum of estimated demand in M-LED from all the modelled sectors.

Eventually, M-LED goes back to the M-LED_hourly.R script, where outputs at different spatial scales are written (also serving as input data for the other soft-linked models in the RE4AFAGRI platform), and the summary csv files and results are generated for each scenario.

Input data, models soft-linking, and scenarios

As discussed above, the scenario_.R are the main files where all the input data subsequently called by the different M-LED modules are defined and imported, as well as country-specific parameters for model calibration and affecting demand estimation are defined. Here we list the key inputs among those files and parameters, referring to line numbers of the [scenario_.R] files:

At lines 14-20, parameters defining country name, code, and socio-economic statistics are listed
At lines 22-28, parameters defining statistics on sectoral electricity demand are listed, in total kWh/year
At lines 32-33, urban and rural average household size is inputed
At lines 36-38, planning horizon parameters are inserted
At lines 40-51, parameters defining system/problem/technological boundaries are listed
At lines 54-68, assumed yearly efficiency gains are inserted
At lines 70-98, techno-economic parameters related to water pumping and transport are inputted
At lines 101-105, techno-economic parameters related to healthcare and education sectors are reported
At lines 107-111, additional parameters defining system/problem/technological boundaries are listed
At lines 114-134, load curve assumptions and parameters are found

Then, from lines 170, input data, including inputs from WaterCrop runs (at lines 182-186), are read. Note that WaterCrop produces netcdf files of irrigation water requirements and yield growth potential for all African countries. These files are contained (and can be updated) in the ./MLED_database/input_folder/watercrop folder and corresponding subfolders for each crop.

Subsequently, input data between lines 192-333 and lines 351-388 are already available for any sub-Saharan African country, and thus require no user intervention, as the scenario file is automatically filtering information for the country requested by the user. The only exception concerns line 210, where the NEST BCU shapefile is read, which needs to be manually updated (refer to the NEST Wiki page.

Input data in lines 335-348 must instead be manually updated by the user for countries different from the five native RE4AFAGRI case-study countries.

Model calibration

M-LED is designed to operate at the country-level. Thus, before projecting future demand, it is calibrated to current electricity consumption levels with recent national statistics and downscaling them at the local population cluster level. In particular, calibration occurs based on the following values and statistics:

National electricity access level (ESMAP)
National electricity demand (IEA)
National electricity demand, residential (IEA)
National electricity demand, industry (IEA)
National electricity demand, other sectors (IEA)

The rationale behind how demand projections are made depends on the sector and typology of final users. For the residential sector, demand is divided between:

Households that are already consuming electricity in the base year (2020) – with demand following GDP per-capita assumption of each scenario conditioned on a flexible income elasticity of electricity demand schedule
Household that will gradually gain access to electricity after 2020 – with demand being based on representative tiers of access estimated through the RAMP appliance-based stochastic model (Lombardi et al. 2019), and tiers being parsed to each population settlement based on a set of determinants

In the non-residential sectors, demand is projected as follows:

SMEs: the demand from non-farm SMEs is proportional to the demand in the residential sector at each cluster, modulated by a range factor (30%-60%) which - at each cluster - is proportional to the PCA indicator of employment rate (from DHS survey data) and roads density (from GIS road maps) of each cluster, a proxy of the labour and economic situation of a given community. Future SMEs demand then evolves at the same growth rate of residential demand at each cluster.
Mining: the demand from mining is spatially downscaled from the national electricity demand from the industry sector (IEA) onto the mining sites geodatabase (Maus et al. 2020) proportional to the measured nighttime light radiance (Colorado School of Mines, 2022) from those sites. Mining demand at each mining site - parsed to nearby population clusters - is then projected to grow at the local (administrative unit of level 2 of belonging) growth rate of per-capita GDP according to the specific SSP scenario considered in the run.
Health & education: health and education demand is projected to converge to a universal electrification of all facilities by a given target year, inclusive of a densification and enlargement of existing facilities following local (administrative unit of level 2 of belonging) population growth projections according to the specific SSP scenario considered in the run.
Crop processing: crop processing demand is projected to converge to a given target of crop yield throughput processing by a given target year, subject to growth in crop yields thanks to growing irrigation as well as subject to the spatial constraints imposed by the crop processing catchment areas.
Irrigation water pumping: water pumping energy demand follows pathways of irrigation water demand derived from the WaterCROP model.

Also urbanisation is calibrated in the clusters through the urbanisation_calibration.R script, which calibrates clusters urban status to national urbanisation rate (based on World Bank statistics and their future evolution over SSP urbanisation projections up to the target year 2060.

Setting up the environment

M-LED has been developed and tested in a Windows 10 environment connected to the Internet. It is written in the R scientific computing programming language.

Software requirements:

Have R (version >=4) installed on your local computer: https://cran.r-project.org/bin/windows/base/
Have a recent version of RStudio installed on your local computer: https://posit.co/download/rstudio-desktop/
Open the MLED_hourly.r file in RStudio
Here, in line 8 set the working directory (i.e. the folder path containing the cloned Github repository folder called mled)
In addition, in line 10 edit the db_folder parameter, specifying the path where the M-LED database was downloaded from Zenodo or where it should be automatically downloaded if the download_data parameter is set to TRUE.
Finally, run lines 1-75. This will automatically run the backend.R file, which will take care of installing all the required package dependencies
During this procedure (to be carried out only the first time M-LED is run), please reply "no" if asked "Install from sources?"

Download the data

The RE4AFAGRI database to run the platform is avaiable at the official Zenodo repository of the RE4AFAGRI platform. At the repository, both country-specific database files are available for each of the five RE4AFAGRI countries (allowing to run the model only on the specified country), and an "all" bundle file, including data enabling to run the model for all the five RE4AFAGRI countries.

Once downloaded, the database(s) (a zipped folder for each of the four models) should be extracted. The exact full path to the database (e.g. C:/Users/[yourusername]/Documents/RE4AFAGRI_database/... should be parsed onto the different model at line 10 of the MLED_hourly.R file, defining the db_folder parameter

Alternatively, it is possible to set the parameter 'download_data' parameter (row 14 of MLED_hourly.r) to TRUE, which will automatically download and unzip the M-LED database from Zenodo for the country(ies) specified in the preamble of the MLED_hourly.r file. Note: after the first run and download the parameter should switched back to FALSE to avoid re-downloading the whole database at the next model run, unless the new run is carried out on another country, in which case the country-specific database will be downloaded for the new country.

Running the model

To run the model, run line 80 to start running the scenarios specified in the MLED_hourly.r preamble in sequence (see below for more details on scenarios definition). By default, this will run all the scenario defined in the scenarios matrix, which can be customised.

Once launched, a log.txt file will automatically appear, tracking the progres of the model run and printing time stamps to ease running time assessment.

Analysing the outputs

In the v2 implementation, for each scenario run, M-LED writes output data at four levels of aggregation, serving both model interlinkage purposes as part of the RE4AFAGRI modelling platform, interactive visualisation purposes in the RE4AFAGRI dashboards under development, and as static output summary files.

M-LED outputs are found in the results folder which is automatically created inside the m-led home folder after a model run. For each scenario run, M-LED writes output data at four levels of aggregation, serving both model interlinkage purposes as part of the RE4AFAGRI modelling platform, interactive visualisation purposes in the RE4AFAGRI dashboards under development, and as static output summary files. Note that the output geopackages are reporting monthly demand for each timestep and each sector in kWh/month, while the summary csv files are reporting units in TWh/year:

OnSSET output geopackage, containing demand for all the original population cluster, the unit of analysis of M-LED
NEST output geopackage, aggregating the population clusters results at the NEST nodes level. In particular two files are written as NEST outputs, one total and one urban/rural stratified output for each NEST node.
GADM level 2 output geopackage, aggregating the population clusters results at the second level of administrative boundaries; useful for visualisation of aggregated results in the online dashboards and for informing policymakers
Summary CSVs and figures of results aggregated at the country level, disaggregated by sector, scenario, and yeargeopackages are reporting monthly demand for each timestep and each sector in kWh/month, while the summary csv files are reporting units in TWh/year.

Runtime

The usual runtime of M-LED depends on an array of factors including:

CPU speed of the local PC
Use of the parallel processing option (see description of the backend.r file above for details
Size of the country modelled and complexity of the underlying input data

To provide a benchmark, running one scenario for Zambia and writing all output files on a Windows 11 PC with 12th Gen Intel(R) Core(TM) i7-12700 2.10 GHz CPU, 64GB RAM and the parallel processing with 50% of the CPU cores takes about 1.5 hours. Note that the first runtime will be slower because the backend.r file will install all the required packages and dependencies, and download the input data if the download_data option is set to T.

Frequent issues and FAQs

** I got the following memory error:**

Potential solution: Lower the cl parameter in the backend.R file, which defines the number or percentage of CPU cores to use for parallel processing. A higher share will decrease the model run time but if set too high it might lead to memory issues and M-LED to fail, thus a safe option is to maintain the default option of 25% of CPU cores. Alternatively, set the allowparallel parameter in the [M-LED.R] file to FALSE.

** I got the following missing data error:**

Potential solution: Set the parameter 'download_data' parameter (row 14 of MLED_hourly.r) to TRUE, which will automatically download and unzip the M-LED database from Zenodo and re-run M-LED for the country defined in the preamble of the MLED_hourly.r file. Note: after the first run and download the parameter should switched back to FALSE to avoid re-downloading the whole database at the next model run, unless the new run is carried out on another country, in which case the country-specific database will be downloaded for the new country.

Here a list of FAQs to the model:

** Is the demand estimated through M-LED the LATENT (TOTAL POTENTIAL) electricity demand of the ACTUAL electricity demand?**

Depending on the value set to the parameter 'latent_d_tot', M-LED estimates the evolution of demand given current (and projected) electricity access rates (if set to 'FALSE') or, instead, the total LATENT DEMAND (the potential demand if access to electricity was already universal), if set to 'TRUE'. By default this option is set to 'TRUE', as M-LED is a latent demand estimator model.

** How do the scenarios vary among each other and what are the key factors influencing differences between scenarios?**

Scenarios are based on a set of assumed underlying drivers evolution (GDP, population), parameters, and policies. In particular:

SSP scenarios over the evolution of population and GDP, and thus GDP per capita, which together drive residential, mining, and SME demand.
RCP scenarios over the evolution of climate change, affecting water demand and yield through WaterCROP inputs
The 'el_access_share_target', 'irrigated_cropland_share_target', and 'crop_processed_share_target' parameter, determining the simulated policy targets over the target share of population with electricity in the last planning year, target share of rainfed cropland irrigation water demand met in the last planning year, and target share of crop yield locally processed in the last planning year.

** How do I get the data to run the model from one of the five RE4AFAGRI countries?**

Set the parameter 'download_data' parameter (row 14 of MLED_hourly.r) to TRUE, which will automatically download and unzip the M-LED database from Zenodo and re-run M-LED. Note: after the first run and download the parameter should switched back to FALSE to avoid re-downloading the whole database at the next model run, unless the new run is carried out on another country, in which case the country-specific database will be downloaded for the new country.

** How do I get the data and parameters to run the model for another country (not part of the five RE4AFAGRI models) and what procedure should I follow to design an additional scenario?**

Follow the procedures listed in the Examples and exercises page.