flowchart LR Dinamica([Dinamica EGO]) --> Docker(dinamica-ego-docker) Docker --> LULCC(LULCC docker) LULCC --> Apptainer[Apptainer container] style Dinamica color:#2780e3, fill:#e9f2fc, stroke:#000000 style Docker color:#0e7895, fill:#cbf4ff, stroke:#000000 style LULCC color:#0e7895, fill:#cbf4ff, stroke:#000000 style Apptainer color:#07946e, fill:#def9f2, stroke:#000000
3 Setup and Usage
The evoland-plus HPC pipeline consists out of various scripts that can be found in the src
directory. For each included step of Figure 2.1, there is a subdirectory in src/steps
.
This part is structured as follows:
The task of the evoland-plus HPC pipeline is to streamline the process, so that varying the climate scenarios and other parameters can be carried out efficiently. Introducing parallelization through SLURM batch jobs, and adding HPC compatibility, are the main tasks of the pipeline. Meanwhile, the pipeline keeps track of the intermediate results, a centralized configuration file, and the execution of each step. Details on the individual steps are given in the following sections.
3.1 Setup
Before you set up the evoland-plus HPC pipeline, you should make sure to satisfy a few requirements. This section will go over hardware and software requirements, and then guide you through the evoland-plus HPC repository setup. The following pages guide through the details of each step in the pipeline, before concluding with the execution of the pipeline.
3.1.1 Requirements
We are using a Linux cluster with SLURM as scheduler. If your cluster uses a different scheduler, you can see if it is compatible with the SLURM syntax, or you can adapt the scripts to your scheduler.
3.1.1.1 Hardware
The minimum memory and CPU requirements cannot generally be stated, as they depend on the area of interest, input data, and the number of scenarios. A viable starting point for a country with the size of Switzerland, using a resolution of 100 m, is 16 GB of memory and 4 CPUs. This is the case for a few scenarios and no parallelization within the steps. Scaling up to around 1000 scenarios, we suggest at least 128 GB of memory and 16 CPUs, to achieve a viable runtime. As this is an estimate, it is essential to monitor runtime before scaling up.
3.1.1.2 Software
Additionally, you need to install the following software:
3.1.1.2.1 Micromamba/Conda
For some pipeline steps, we use conda environments. Conda is a package manager that helps you manage dependencies in isolated environments. We recommend using micromamba
, which does the same job as Conda, but resolves dependencies much faster, with the flexibility of miniconda
(CLI of Conda). Find the installation instructions for Micromamba here. We have added compatibility for micromamba
, mamba
and conda
, in this order of preference, but only tested with micromamba
1.
We have chosen conda-forge
as the default channel for the conda environments, as it is a single source for our R
, Python, and lower-level dependencies (e.g., gdal
, proj
). This is independent of the modules and applications provided by the HPC environment.
3.1.1.2.2 Apptainer
Running containerized applications on HPCs can be challenging. To simplify the process, we use the Apptainer (formerly Singularity) container runtime. Make sure your HPC environment supports Apptainer, and that you have the necessary permissions to run containers. If this is not the case, contact your HPC support team for help.
3.1.1.2.3 Docker
Building the LULCC container requires Docker2 before converting it to the Apptainer format. The lulcc
container uses the dinamica-ego-docker
container (version 7.5
).
This step can be done on a local machine, and will be explained in the LULCC step.
3.1.1.2.4 Dinamica EGO
Dinamica EGO is an environmental modeling platform used in the LULCC step. It is available on the project website. But as aforementioned, it will be used from the LULCC docker image, as it is only integrated from the command line interface (CLI), not with the usual graphical user interface (GUI).
3.1.1.2.5 Yaml Parser yq
For the bash
scripts, we use yq
to parse the yaml
configuration file. yq
needs to be available in the PATH
variable of the shell. To install the latest version3, run the following command:
bin_dir=/usr/bin &&\
wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O $bin_dir/yq
chmod +x $bin_dir/yq
Other installation options and binaries can be found on the repository’s README. To make yq
available in the PATH
variable, make sure the $bin_dir
is in the PATH
variable. To check the parser is installed correctly, run yq --version
in the shell.
3.1.1.2.6 LULCC Repository
The version used for evoland-plus HPC is a reduced version of the original model, adapted for containerized execution on HPCs, and can be found on the hpc
branch of the repository. Clone the repository to the HPC using git or download the repository as a zip. If you have never used git before, search online for a guide on how to clone a repository.
3.1.2 evoland-plus HPC Repository
After you have set up the requirements, you can clone the evoland-plus HPC repository. This repository contains the pipeline and all necessary scripts to run it.
Before you start the pipeline, you need to configure the pipeline. These settings are centralized in the config.yml
file. There are only a few mandatory changes we will highlight, but you can find more settings with descriptive names in the file.
src/config.yml
# Bash variables
bash_variables:
FUTURE_EI_CONFIG_FILE: ~/evoland-plus HPC/src/config.yml
FUTURE_EI_OUTPUT_DIR: ~/evoland-plus HPC-Output
...
# LULCC HPC version
LULCC_CH_HPC_DIR: ~/LULCC-CH
...
# Overwrites $TMPDIR if not set by the system. $TMPDIR is used by Dinamica EGO.
# and conda/libmamba
ALTERNATIVE_TMPDIR: /scratch
...
For each script, src/bash_common.sh
is sourced to set the environment variables. First, FUTURE_EI_CONFIG_FILE
needs to be set to the absolute path of this configuration file. FUTURE_EI_OUTPUT_DIR
is the directory where the outputs of the pipeline will be stored. As the pipeline needs a multiple more temporary space than the output itself, having a fast and large temporary directory is crucial. If the HPC does not set the $TMPDIR
variable, you can set it to a different directory using ALTERNATIVE_TMPDIR
. This will be used in the LULCC and NCP steps for temporary files. Finally, LULCC_CH_HPC_DIR
is the directory where the LULCC repository is stored, which was cloned in the previous step.
src/config.yml
# Focal LULC
FocalLULCC:
...
# LULC check
CheckLULCC:
...
To mention the FocalLULCC
and CheckLULCC
sections, these are settings dedicated to separate steps in the pipeline and are specifically loaded in the respective scripts. We will touch on these settings in the respective steps. To see the current settings (and test yq
), print the contents of config.yml
as idiomatic YAML to stdout:
yq -P -oy src/config.yml
As a last general note, make sure to set the permissions of the scripts to executable. To make all bash scripts in the source executable, give them the permission as follows:
# possibly activate globstar: shopt -s globstar
chmod +x src/**/*.sh
The next sections will guide you through the setup of each step in the pipeline.
3.2 Steps
3.3 Land Use Land Cover Change
LULCC is a Dinamica EGO
(Leite-Filho et al. 2020) model, and makes use of the R
(R Core Team 2022) ecosystem, including packages from the Comprehensive R Archive Network (CRAN). You can find the LULCC model, as well as an adapted version for use with evoland-plus HPC, in the LULCC repository (Black 2024), as mentioned in the setup section.
LULCC needs a variety of inputs. These are set via environment variables in the src/config.yml
file — the assumption being that the src
directory contains project-specific code, and hence also setup details. Here is an excerpt of the file:
src/config.yml
# Bash variables
bash_variables:
...
# Model Variables - from LULCC_CH_HPC root
LULCC_M_CLASS_AGG: Tools/LULC_class_aggregation.xlsx
LULCC_M_SPEC: Tools/Model_specs.csv
LULCC_M_PARAM_GRID: Tools/param-grid.xlsx
LULCC_M_PRED_TABLE: Tools/Predictor_table.xlsx
LULCC_M_REF_GRID: Data/Ref_grid.tif
LULCC_M_CAL_PARAM_DIR: Data/Allocation_parameters/Calibration
LULCC_M_SIM_PARAM_DIR: Data/Allocation_parameters/Simulation
LULCC_M_RATE_TABLE_DIR: Data/Transition_tables/prepared_trans_tables
LULCC_M_SIM_CONTROL_TABLE: ~/LULCC-CH/Tools/Simulation_control.csv
LULCC_M_SPAT_INTS_TABLE: Tools/Spatial_interventions.csv
LULCC_M_EI_INTS_TABLE: Tools/EI_interventions.csv
LULCC_M_SCENARIO_SPEC: Tools/Scenario_specs.csv
LULCC_M_EI_LAYER_DIR: Data/EI_intervention_layers
LULCC_M_REMOVE_PRED_PROB_MAPS: True # remove prediction probability maps after
# simulation if 1, True or TRUE
A relevant parameter to change is the LULCC_M_SIM_CONTROL_TABLE
variable. This is the only path that is absolute, and it should point to the Simulation_control.csv
file. All further paths are relative to the LULCC repository root: the files under Tools
are configuration files, while the Data
directory contains input and working data. For information on the further variables, see the LULCC repository and paper (Black 2024).
3.3.1 Simulation Control Table
Simulation_control.csv
is a table that controls the scenarios to be simulated, including the data described in Table 3.1. This format extends the original format from the LULCC model.
~/LULCC-CH/Tools/Simulation_control.csv
Simulation_num.,Scenario_ID.string,Simulation_ID.string,Model_mode.string,Scenario_start.real,Scenario_end.real,Step_length.real,Parallel_TPC.string,Pop_scenario.string,Econ_scenario.string,Climate_scenario.string,Spatial_interventions.string,EI_interventions.string,Deterministic_trans.string,Completed.string,EI_ID.string
1,BAU,1,Simulation,2020,2060,5,N,Ref,Ref_Central,rcp45,Y,Y,Y,N,1
217,EINAT,217,Simulation,2020,2060,5,N,Low,Ecolo_Urban,rcp26,Y,Y,Y,N,217
433,EICUL,433,Simulation,2020,2060,5,N,Ref,Ecolo_Central,rcp26,Y,Y,Y,N,433
649,EISOC,649,Simulation,2020,2060,5,N,Ref,Combined_Urban,rcp45,Y,Y,Y,N,649
865,BAU,865,Simulation,2020,2060,5,N,Ref,Ref_Central,rcp85,Y,Y,Y,N,1
Each colum describes one scenario to be simulated. This table controls which data is used to simulate the land use changes.
Simulation_control.csv
file.
Column Name | Description |
---|---|
Simulation_num. |
The number of the simulation. |
Scenario_ID.string |
The scenario ID. |
Simulation_ID.string |
The simulation ID. |
Model_mode.string |
The model mode. |
Scenario_start.real |
The start year of the scenario. |
Scenario_end.real |
The end year of the scenario. |
Step_length.real |
The length of the steps. |
Parallel_TPC.string |
Whether the simulation is parallelized. |
Pop_scenario.string |
The population scenario. |
Econ_scenario.string |
The economic scenario. |
Climate_scenario.string |
The climate scenario (e.g., rcp45 , rcp26 , rcp85 ). |
Spatial_interventions.string |
Whether spatial interventions are used. |
EI_interventions.string |
Whether EI interventions are used |
Deterministic_trans.string |
Whether deterministic transitions are used. |
Completed.string |
Whether the simulation is completed. |
EI_ID.string |
The EI ID. |
3.3.2 Container Setup
For a platform independent execution of Dinamica EGO, we created a dinamica-ego-docker
container container. This way, the glibc version is fixed, and the container can be used system independently5. This one is used in the LULCC docker container. Our Dockerfile src/steps/10_LULCC/Dockerfile
then adds the necessary R packages for LULCC to the container. The Apptainer Definition File src/steps/10_LULCC/lulcc.def
bootstraps the docker container, mounts the LULCC_CH_HPC_DIR
to the /model
directory (it is not shipped within the container), and translates the entry point to the Apptainer format. This includes adding the necessary environment variables, connecting the Simulation Control Table, pointing Dinamica EGO to the correct R binary, among other details found in the Definition File. Figure 3.1 summarizes the levels of wrapping.
To load the LULCC docker onto your system, it can be automatically installed or built using the src/steps/10_LULCC/docker_setup.sh
script, which uses variables from the src/config.yml
. If you have docker
installed, the setup script guides you through the building, pushing, or pulling of the LULCC docker container. This step can be done on a local machine. Consecutively, when having apptainer
installed, the LULCC docker can be converted to an Apptainer container. On the HPC, this latter step suffices if you use the pre-configured LULCC_DOCKER_REPO
, unless you want to rebuild the container. The decisive line in the script is:
src/steps/10_LULCC/docker_setup.sh (lines 84ff)
apptainer build \
--build-arg "namespace=$namespace" --build-arg "repo=$repo" \
--build-arg "version=$version" \
"$APPTAINER_CONTAINERDIR/${repo}_${version}.sif" "$SCRIPT_DIR/lulcc.def"
Depending on your system, you might want to reconfigure the Apptainer variables:
src/config.yml
# Bash variables
bash_variables:
...
# Apptainer variables for the apptainer container
APPTAINER_CONTAINERDIR: ~/apptainer_containers
APPTAINER_CACHEDIR: /scratch/apptainer_cache
APPTAINER_CONTAINERDIR
is used to store the Apptainer containers, and APPTAINER_CACHEDIR
is used when building them. If your HPC does not have a /scratch
directory, you might want to change it to another temporary directory.
After all previous steps are completed, you can test the LULCC model with some test scenarios in the simulation control table. src/steps/10_LULCC/slurm_job.sh
submits. Before the full, parallelized simulation can be started, read the following sections.
3.4 Check LULCC
For checking the LULCC output integrity of the previous step, an intensity analysis is performed. As a previous measure of checking the LULCC output integrity, a simple visual inspection of the output maps is recommended. Subsequently, the intensity analysis regards the cumulative pixel-wise change in land use and land cover (LULC) classes, and computes the contingency table over a time series, as a measure of change between each land use class. These changes should be in a realistic range (e.g., between \(0\%\) and \(5\%\)), otherwise this can point to issues in the input data or the model itself.
The configuration section for this step is as follows:
src/config.yml
# LULC check
CheckLULCC:
InputDir: # keep empty to use FUTURE_EI_OUTPUT_DIR/LULCC_CH_OUTPUT_BASE_DIR
OutputDir: # keep empty to use FUTURE_EI_OUTPUT_DIR/CHECK_LULCC_OUTPUT_DIR
BaseName: LULCC_intensity_analysis # Can be used to distinguish different runs
Parallel: True
NWorkers: 0 # 0 means use all available cores
This step uses a conda environment with raster~=3.6-26
aside further R packages. The automatic setup script src/steps/11_CheckLULCC/11_CheckLULCC_setup.sh
needs to be executed to set up the conda environment. It sets up a conda environment check_lulc
with the packages found in 11_checklulcc_env.yml
.
Running the intensity analyis is as easy as sbatch
ing the job script slurm_job.sh
.
sbatch src/steps/11_CheckLULCC/slurm_job.sh
The sbatch
command submits the job to the HPC scheduler with the running options specified in the header of the job script.
src/steps/11_CheckLULCC/slurm_job.sh (lines 1-10)
#!/bin/bash
#SBATCH --job-name="11_check_lulcc"
#SBATCH -n 1 # Number of cores requested
#SBATCH --cpus-per-task=25 # Number of CPUs per task
#SBATCH --time=4:00:00 # Runtime
#SBATCH --mem-per-cpu=4G # Memory per cpu in GB (see also --mem)
#SBATCH --tmp=2G # https://scicomp.ethz.ch/wiki/Using_local_scratch
#SBATCH --output="logs/11_check_lulcc-%j.out"
#SBATCH --error="logs/11_check_lulcc-%j.err"
#SBATCH --mail-type=NONE # Mail events (NONE, BEGIN, END, FAIL, ALL)
Change these settings according to your needs and the available resources. Monitor the logs in the logs
directory to check the progress of the job. If you want to specify more options, refer to the SLURM documentation or your local HPC documentation.
3.5 Focal LULC
This step calculates focal statistics for the land use and land cover change (LULCC) data. The resulting focal windows are used for the N-SDM model (Black 2024). It uses a similar structure to the previous Check LULCC step, as it uses another conda environment and this task also has a separate job script. The configuration section for this step is as follows:
src/config.yml
# Focal LULC
FocalLULCC:
InputDir: # keep empty to use FUTURE_EI_OUTPUT_DIR/LULCC_CH_OUTPUT_BASE_DIR
OutputDir: # keep empty to use FUTURE_EI_OUTPUT_DIR/FOCAL_OUTPUT_BASE_DIR
BaseName: ch_lulc_agg11_future_pixel # Underscores will be split into folders
RadiusList: [ 100, 200, 500, 1500, 3000 ]
WindowType: circle
FocalFunction: mean
Overwrite: False # False -> skip if output exists, True -> overwrite
Parallel: True
NWorkers: 0 # 0 means use all available cores
This script recursively goes through the input directory and calculates the focal statistics for each scenario. It creates the outputs in a similar structure, inside the output directory, named after the BaseName
. For each scenario, the focal statistics by WindowType
and FocalFunction
are calculated for each radius in RadiusList
. For details, consult the docstring of the method 20_focal_statistics::simulated_lulc_to_predictors
.
This step uses a conda environment with raster~=3.6-26
, terra~=1.7-71
(only used for conversion), and further R packages. The conda environment focal_lulc
is set up by executing the setup script
src/steps/20_FocalLULC/20_FocalLULC_setup.sh
.
As for the previous steps, the job script slurm_job.sh
needs to be submitted to the HPC scheduler.
sbatch src/steps/20_FocalLULC/slurm_job.sh
3.6 Nature’s Contributions to People
Based on the code written for Külling et al. (2024), we automatized the calculation of eight NCP. To note, our study includes more NCP as this, as some of them are characterized by the plain focal windows (Black et al. 2025). (Ben: true?)
Additionally to R
and CRAN packages, InVEST
is used via the Python
module natcap.invest
in this step.
As the previous two steps, setting up the conda environment ncps
is done using the src/steps/40_NCPs/40_NCPs_setup.sh
script.
3.6.1 NCPs
Table 2.1 lists all NCP calculated in the evoland-plus HPC project. Here, we detail the eight NCP calculated in this step. The config.yml
file includes a few variables which are automatically used for the NCP calculation.
src/config.yml (54-59)
# NCP variables
NCP_PARAMS_YML: ~/evoland-plus HPC/src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
NCP_RUN_SCENARIO_ID: # Scenario ID, automatically set for each configuration
NCP_RUN_YEAR: # Year for which to run NCPs, automatically set
NCP_RUN_OUTPUT_DIR: # Output directory for NCPs, automatically set
NCP_RUN_SCRATCH_DIR: # Scratch directory for NCPs, automatically set
The more detailed configuration for each NCP is stored in the 40_NCPs_params.yml
file. For parallelization purposes, each array job receives a copy of this file with the respective scenario ID and year. The bash variables NCP_RUN_*
from the config.yml
act as a placeholder.
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
# Run Params (are passed when calling the run_all_ncps.py script)
run_params:
NCP_RUN_SCENARIO_ID:
NCP_RUN_YEAR:
NCP_RUN_RCP: # programmatically set in load_params.py
NCP_RUN_INPUT_DIR:
NCP_RUN_OUTPUT_DIR:
NCP_RUN_SCRATCH_DIR:
LULCC_M_EI_LAYER_DIR: # set in load_params.py (uses config.yml) # SDR
For preparation, it is indispensable to set the paths to the input data. Some of these are shared among multiple NCP, as noted in the comments. The first three layers are automatically found in the NCP_RUN_INPUT_DIR
and depend on the scenario ID and year. These three are constructed with the template that the LULCC model produces, as can be seen in the load_params.py
script.
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
# Data
data:
# LULC - CAR, FF, HAB, NDR, POL, SDR, WY
lulc: # automatically found in NCP_RUN_INPUT_DIR
# Rural residential - HAB
rur_res: # automatically found in NCP_RUN_INPUT_DIR
# Urban residential - HAB
urb_res: # automatically found in NCP_RUN_INPUT_DIR
# Production regions - CAR
prodreg: Data/PRODUCTION_REGIONS/PRODREG.shp
# DEM - CAR, NDR
dem: Data/DEM_mean_LV95.tif
# DEM filled - SDR
dem_filled: Data/DEM_mean_LV95_filled.tif
# Wathersheds - NDR, SDR, WY
watersheds: Data/watersheds/watersheds.shp
# Subwatersheds - WY
sub_watersheds: Data/watersheds/Subwatersheds.shp
# ETO - WY
eto: Data/evapotranspiration/
# PAWC - WY
pawc: Data/Water_storage_capacity_100m_reclassified1.tif
# Erodibility path - SDR
erodibility_path: Data/Kst_LV95_ch_nib.tif
# Erosivity path - SDR
erosivity_path: Data/rainfall_erosivity/
# Precipitation - WY, NDR
yearly_precipitation: Data/yearly_prec/
# Soil depth - WY
depth_to_root_rest_layer: Data/rrd_100_mm_rexport.tif
# Precipitation avgs - FF
pavg_dir: Data/monthly_prec/
# Temperature avgs - FF
tavg_dir: Data/monthly_temp/
# Soil texture - FF
ph_raster: Data/ch_edaphic_eiv_descombes_pixel_r.tif
# Distance to lakes - REC
distlakes_path: Data/distlakes.tif
# Projection Settings - change for different regions
proj:
# CRS
crs: epsg:2056
# Extent
ext: [ 2480000, 2840000, 1070000, 1300000 ]
# Resolution
res: 100
For each NCP, the configuration is detailed in the following sections.
3.6.1.1 CAR: Regulation of climate
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
CAR:
# 1_CAR_S_CH.R
# 2_CAR_S_CH.py
bp_tables_dir:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/CAR/BPTABLE/
# 3_CAR_S_CH.R
# output prefix
out_prefix: tot_c_cur_
To calculate the carbon stored in biomass and soil, the CAR
NCP needs biophysical tables that specify the carbon content of different land use classes. The natcap.invest
-model Carbon Storage and Sequestration is used for this calculation.
3.6.1.2 FF: Food and feed
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
FF:
# 0_FF_ecocrop.R
crops_data:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/FF/crops.txt
ecocrop_dir: evoland-plus HPC-Output/FF_preprocessing_ecocrop/
The FF
NCP calculates the crop production potential using the ecocrop
package. The package uses a limiting factor approach Hackett (1991). This NCP has a data preparation step which needs to be executed once before running the parallelized NCP calculation. It is a single R script that can easily be triggered with calling src/steps/40_NCPs/NCP_models/prepare_ncps.sh
, no SLURM needed.
3.6.1.3 HAB: Habitat creation and maintenance
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
HAB:
# 0_thread_layers_generation.R
# 1_HAB_S_CH.py
half_saturation_constant: 0.075
bp_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/HAB/BPTABLE/
sensitivity_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/HAB/BPTABLE/hab_sensitivity.csv
threats_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/HAB/BPTABLE/threats.csv
The HAB
NCP calculates the habitat quality index, another natcap.invest
model. Set the three biophysical tables accordingly.
As we had problems how natcap.invest==3.13.0
handles its treat layer table, we had to introduce a hotfix in the source code to keep compatibility with the existing NCP configuration. When loading in the threat layers, natcap.invest
wants to convert the column names to lowercase to be case-insensitive, but the layer paths are also converted to lowercase, but our threat layers are case-sensitive. To fix this bug, we changed the to_lower
argument in the execute
function in the habitat_quality.py
file and set the column name to match our lowercase column name.
.../ncps/lib/python3.10/site-packages/natcap/invest/habitat_quality.py (line 384)
# Change from:
'threats_table_path'], 'THREAT', to_lower=True,
args[# to:
'threats_table_path'], 'threat', to_lower=False, args[
In later versions, the InVEST developers have changed the modality of loading in these tables. Compatibility with the latest version of natcap.invest
can be added when adapting breaking changes with the further NPC. We want to note that changing the source code is a bad practice and should only be considered as a last resort.
To find the corresponding natcap folder, navigate to the environment folder, from where you find the site-packages
folder.
# activate the ncps environment with micromamba or conda
micromamba activate ncps
# find the site-packages folder
python -c "import site; print(site.getsitepackages())"
>>> ['.../micromamba/envs/ncps/lib/python3.10/site-packages']
In this folder, you navigate further down to find .../site-packages/natcap/invest/habitat_quality.py
.
3.6.1.4 NDR: Nutrient Delivery Ratio
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
NDR:
# 1_NDR_S_CH.py
# Biophysical table
biophysical_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/NDR/BPTABLE/ndr_bptable_ds25_futei.csv
calc_n: true
calc_p: true
k_param: 2
# Suffix for output files
# Subsurface critical length
subsurface_critical_length_n: 100
# Subsurface effective retention
subsurface_eff_n: 0.75
# Threshold flow accumulation
threshold_flow_accumulation: 200
The NDR
NCP calculates the Nutrient Delivery Ratio. The biophysical table specifies the nutrient retention by vegetation using various variables, e.g., root depth and more detailed soil properties described in the natcap.invest
documentation.
3.6.1.5 POL: Pollination and dispersal of seeds
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
POL:
# 1_POL_S_CH.py
# Farm vector path
farm_vector_path: ''
# Guild table path
guild_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/POL/BPTABLE/guild.csv
# Landcover biophysical table path
landcover_biophysical_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/POL/BPTABLE/pollination_bptable_ds25_futei.csv
# 2_POL_S_CH_aggregating.R
The POL
NCP calculates the natcap.invest
Crop Pollination model. Followed by an aggregation step in R.
3.6.1.6 REC: Recreation potential
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
REC:
# 1_REC.R
# lulc naturality lookup table
lutable_nat_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/REC/BPTABLE/lutable_naturality.csv
The REC
NCP returns a Recreation Potential (RP) indicator. This is a normalized aggregate of three landscape characteristics maps:
- Degree of naturalness (DN): Aggregate sum of naturalness scores for each LULC class.
- Natural protected areas (NP): Binary map of
0=outside
protected areas,1=inside
protected areas. - Water components (W): Inverse relative distance to lake coasts, with the highest value at the lake coast and a decreasing value for 2 km.
The output is a single map of recreation potential.
3.6.1.7 SDR: Formation, protection and decontamination of soils
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
SDR:
# 1_SDR_S_CH.py
# Biophysical table
biophysical_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/SDR/BPTABLE/bptable_SDR_v2_futei.csv
# Drainage path
ic_0_param: 0.4
k_param: 2
l_max: 100
# SDR max
sdr_max: 0.75
# Threshold flow accumulation
threshold_flow_accumulation: 200
Sediment export and retention are calculated in the SDR
NCP with the Sediment Delivery Ratio model from natcap.invest
.
3.6.1.8 WY: Regulation of freshwater quantity, location and timing
src/steps/40_NCPs/NCP_models/40_NCPs_params.yml
WY:
# 1_WY_S_CH.py
# Biophysical table
biophysical_table_path:
evoland-plus HPC/src/steps/40_NCPs/NCP_models/WY/BPTABLE/wy_bptable_ds25_futei.csv
# Seasonality constant
seasonality_constant: 25
Annual Water Yield is the final NCP calculated in this step. The WY
NCP calculates the hydropower potential.
3.6.2 Running the NCP calculation
Assuming the ncps
environment is set up, all previous configurations are correctly set, the input data is available, and the FF NCP has been prepared using src/steps/40_NCPs/NCP_models/prepare_ncps.sh
, the NCP calculation can be started.
To calculate all NCP for one scenario and year, the run_all_ncps.py
script bundles the execution of all NCP. It is used like so:
# Usage: bash run_all_ncps.sh <NCP_RUN_SCENARIO_ID> <NCP_RUN_YEAR> <NCP_RUN_INPUT_DIR> <NCP_RUN_OUTPUT_DIR> <NCP_RUN_SCRATCH_DIR>
bash src/steps/40_NCPs/NCP_models/run_all_ncps.sh 1 2015 /path/to/input_dir /path/to/output_dir /path/to/scratch_dir
The simplified execution of this using the HPC scheduler SLURM is done with sbatch src/steps/40_NCPs/NCP_models/slurm_job.sh
. The scenario ID and year are set in the job script.
The full, parallelized execution of the evoland-plus HPC pipeline for all scenarios with LULCC and NCP calculation is done with the 10_40_combined_array_job.sh
script and SLURM, for this consult the following Running section.
3.7 Running
The pipeline is executed in three parts, each part is a separate Slurm job. Remember Figure 2.1 from the Structure section. The most computationally intensive steps, LULCC and NCP are parallelized and submitted as one Slurm array job. For all of these steps, you need to have followed the previous sections to set up and configure the pipeline. This includes preparing the FF NCP using src/steps/40_NCPs/NCP_models/prepare_ncps.sh
and filling the simulation control table with all the scenarios you want to run.
3.7.1 evoland-plus HPC pipeline
Land Use Simulation and NCP Estimation can separately be calculated for one scenario with the jobs src/steps/10_LULCC/slurm_job.sh
and src/steps/40_NCPs/slurm_job.sh
. The 10_40_combined_array_job.sh
slurm job calculates both steps for all scenarios in parallel. Each array job receives a subset of the scenarios to calculate. All scenarios are calculated in parallel with the following slurm job:
sbatch src/steps/10_40_combined_array_job.sh
This would submit the job to the cluster and start the calculation with the default settings.
src/steps/10_40_combined_array_job.sh
#!/bin/bash
#SBATCH --job-name="10_40_combined_array"
#SBATCH -n 1 # Number of cores requested
#SBATCH --cpus-per-task=2 # Number of CPUs per task
#SBATCH --time=7-00:00:00 # Runtime in D-HH:MM:SS
#SBATCH --mem-per-cpu=2G
#SBATCH --tmp=2G
#SBATCH --output="logs/10_40_combined_array-%j.out"
#SBATCH --error="logs/10_40_combined_array-%j.err"
#SBATCH --mail-type=NONE # Mail events (NONE, BEGIN, END, FAIL, ALL)
## Array job
#SBATCH --array=1-216%12 # start-end%num_parallel
# ! step size needs to be 1
The speed-up of the combined job is achieved by running multiple scenarios in parallel. We do this, as the speed-up assigning more CPUs to one scenario is limited. Each of the 216 array jobs is assigned one core with two CPUs and 4 GB of memory. %12
in the array specification ensures that 12 array jobs are run in parallel, if one job finishes, the next one is started. Each array job has a time limit of 7 days.
In our case, we had 1080 scenarios to calculate, so we set the array job to run 216 scenarios in parallel to have five scenarios per array job. Before, we have tested with only having one scenario in the simulation control table, 10 GB of memory, and SBATCH --array=1-1
to check if the job runs correctly. Running the Switzerland map at a resolution of 100 m by 100 m, the job took 6:23:06 hours to complete at a CPU efficiency of 75.29% and a memory efficiency of 75.58%. With tail -f logs/10_40_combined_array_-*.out logs/10_40_combined_array_-*.err
it is easy to monitor the progress of the job. For explanations and more details on the sbatch
options, see the Slurm documentation.
When running a large array of scenarios, the array jobs vary in the amount of memory they require and time they take. It is a valid approach to start with a memory limit that works for the majority of scenarios. Some jobs might fail due to memory issues, but after all array jobs have finished, it is possible to rerun the failed scenarios with a higher memory limit. This is possible because the LULCC and NCP are only calculated if each output file is missing, down to the level of each NCP.
To get a simple estimation on how long the job array takes, you can use cross-multiplication, starting with the time it took to calculate one scenario \(t_{\text{one}}\). With the number of scenarios \(n_{\text{all}}\) and the number of scenarios calculated in parallel \(n_{\text{parallel}}\), the time it takes to calculate all scenarios \(t_{\text{all}}\) is:
\[ t_{\text{all}} = \frac{n_{\text{all}}}{n_{\text{parallel}}} \times t_{\text{one}} \]
3.7.2 Check LULCC and Focal LULC
As explained in their respective sections, the steps Check LULCC and Focal LULC can be already run after the LULC layers are present. Both src/steps/11_CheckLULCC/slurm_job.sh
and src/steps/20_FocalLULC/slurm_job.sh
are also submitted with sbatch
. In contrast, these are simple jobs and their parallelization is achieved by assigning more CPUs to the job and using R’s asynchronous processing future::plan(future::multisession)
.
3.7.3 Logging
There are multiple levels of logging in the pipeline. When running the Slurm jobs, the output and error logs are written to the specified files. These are coming from three main sources: the R scripts, the Python scripts, and the Slurm job scripts. Generally, slurm logs are written to the file specified in the job script. For the logs regarding the scripts written for this pipeline, FUTURE_EI_LOG_LEVEL: debug
in the config.yml
file can be set to debug
, info
, warning
, or error
. The NCP calculation uses natcap.invest
which has detailed logs written to the console. For the LULCC container, Dinamica EGO has more detailed logs of the integrated R scripts. They are written to the mounted LULCC_CH_HPC_DIR
directory and do not show up in the Slurm logs. Dinamica EGO has a separate log level that can be set through the DINAMICA_EGO_CLI_LOG_LEVEL
environment variable.
4 Further Steps
Upon completing the four steps, we obtain LULC layers, focal windows, and NCP. These outputs can be further analyzed and utilized for additional processes, such as species distribution modeling.
In our scenario, we have incorporated a fifth step that leverages the same repository structure and configuration as the previous steps. The code for this step is located in a different repository, which ensures that the workflow remains consistent and manageable. This additional step allows for more comprehensive analysis and extends the capabilities of the initial four steps, providing a robust framework for further research and application.
The installed CLI is identified via bash variables in
src/bash_common.sh
. If none is found, an error highlights the issue.↩︎The Docker version used is
24.0.7
, but the container should be compatible with most versions.↩︎We have used
yq
v4.40.3
, but any version>=4.18.1
should work.↩︎The versions of the R packages used with LULCC are listed in the note below.
↩︎For more information on the compatibility of Dinamica EGO with Linux, see the Dinamica EGO documentation.↩︎
The versions of the packages used in the NCP calculation are listed in the note below.
↩︎