Execution

Multiple evaluations of a model are a pivotal part of Uncertainty Quantification algorithms. Since a single model can be a demanding parallel application, execution of numerous evaluations of such models can require large computing resources and can be a challenging task by itself. In mUQSA, we take care of this tough problem and try to automate the process of evaluations on HPC resources, ensuring good efficiency and scalability of executions.

Once the mUQSA scenario is defined, the workflow consisting of all model evaluations is prepared for the execution with EasyVVUQ and QCG-PilotJob and submitted to an HPC cluster. Next, once the task is launched on a supercomputer, QCG-PilotJob mechanisms are employed to parallelize multiple executions on all available resources. Thus, for example, when an allocation consists of 48 cores, but the model uses only 12 cores, there will be 4 evaluations running in parallel.

Configuration of executions

The following options are available for configuring the execution of evaluations:

OptionDescription
Calculation typeShould a model be run in a serial mode or in a parallel mode (e.g. using MPI)?
Nodes (parallel)How many nodes is required for a single model evaluation?
Cores (parallel)How many cores on a single node is required for a single model evaluation?
Number of parallel evaluationsHow many evaluations (in particular internally parallel evaluations) should be run in parallel?
Calculation timeIt is used to determine the limit of time needed to perform calculations: in the automatic option (available for selected algorithms), only the time needed to perform a single evaluation is required to be entered, and the algorithm automatically calculates the total calculation time; in the manual option, it is required to enter the total time limit needed to perform all calculations (all evaluations)

For selected algorithms, mUQSA presents information about the required number of evaluations and an estimate of the execution times.

Notes

  • A minimal number of resources that will be allocated is equal 1 full node (all cores from one node).
  • If a number of cores for a single evaluation is between half of a number of all cores available on a node and a number of cores on a node, each new evaluation will be executed on a separate node.
  • If the number of cores per node is below a half of the number of all cores available on a node, mUQSA will try to run as many evaluations on this node as possible.