Dancing with models > Model run parallelisation

Modern computing technology offers many opportunities for process parallelisation. When model runs are parallelised by PEST, parallelisation takes place at the highest level. Model run parallelisation can take place:

• on a single computer;

•on a network of computers in a home or office;

•on a high performance computing cluster;

•in the cloud.

The inversion algorithms on which PEST and PEST++ are based benefit greatly from model run parallelisation. For example, model runs that are undertaken in order to fill a Jacobian (i.e. sensitivity) matrix are completely independent of each other. This is because each relies on incremental variation of a separate parameter. Similar considerations apply to ensemble-based history-matching; model runs that test the ability of different hydraulic property realisations to fit a history-matching dataset are completely independent of each other.

The time taken to undertake highly parameterised history-matching can therefore be reduced in proportion to the number of parallel instances of the model that can be run. Depending on available computing resources, it can be reduced from days to hours (or less).

Parallelisation of model runs must preserve the non-intrusive interface between PEST and a model. This is achieved through the concept of a run manager and local run agents.

Each instance of the model must run in its own folder. This keeps its input and output files separate from those of other model instances. Each instance of the model is under the control of its own agent. The agent undertakes tasks that the non-intrusive PEST-to-model interface requires.

•The agent writes parameter values to model input files using local copies of template files.

•Then the agent issues a system command to run the model.

•When the model has finished running, the agent reads observations from model output files using local copies of instruction files.

How does the agent know what parameter values to give to the model? It receives them from the run manager. And what does it do with the values of observations that it reads from model output files? It sends them back to the run manager. We will look at this next. Meanwhile, here is a schematic of a single agent doing its job.

agent

A model run agent.

BEOPEST, PEST_HP and other programs of the PEST_HP suite include a parallel run manager in addition to a numerical engine that performs the primary task for which the program was written (such as model calibration). So do programs of the PEST++ suite.

PEST and PEST++ run managers operate according to the same set of principles. The run manager works alongside the numerical engine which performs the primary duty of the program. However, instead of issuing system calls to run the model, the numerical engine gives parameter sets to the run manager which then distributes them to agents. When agent-supervised model runs are complete, agents return observations to the run manager, which then provides them to the numerical engine.

When execution of BEOPEST, PEST_HP or PEST++ is initiated, the accompanying run manager is simultaneously initiated. The first thing that the run manager does is open up a TCP/IP port. (You can select the number of this port yourself.) The run manager then waits and listens. (TCP/IP is the language of the internet.)

When initiating execution of an agent, a user must inform the agent of the IP address of the computer on which the manager is running. It must also inform it of the port that the manager has opened. The agent does not need to run on the same computer as the manager. Actually, the manager does not even need to know where the agent is. All that it knows is that an agent has contacted it through its listening TCP/IP port.

If the agent runs on a different computer from the run manager, then the network manager may need to grant permission for this type of agent-to-manager connection. This permission is not very strong. In general, if an agent's machine is permitted to "ping" the manager's machine, this is all that is needed. The manager does not write anything on the agent machine disk. Nor does the agent write anything on the manager machine's disk.

Once a manager has been contacted by at least one agent, it can start to distribute model runs. Meanwhile it keeps listening for other agents to connect to it. There is no upper limit on the number of agents that can connect to a run manager.

The manager keeps a record of which agents run the model fastest. It gives preference to fast agents when allocating future model runs. The manager also forgives the loss of agents. If an agent is lost prior to completion of a model run, the interrupted run is re-assigned to another agent. If a model run takes too long (because the simulator's solver cannot converge for the set of parameters that the model has been allocated), then the model run is abandoned.

Once the numerical tasks performed by BEOPEST/PEST_HP/PEST++ are complete (or if it is halted by the user), its run manager shuts down. However before it does so, it informs all agents that they can shut down too.

The situation is depicted in the following figure.

manager_and_agents

Parallelisation of model runs.

To run the BEOPEST manager, commence its execution using a command such as:

beopest case /h :4004

where case.pst is the name of a PEST control file. The port that the run manager opens in this case is 4004.

To start an agent, type the command:

beopest case /h manager_address:4004

Substitute the manager's IP address or hostname for manager_address in the above command. It is apparent that, when using BEOPEST, the agent is the same as the manager. The same applies when running programs of the PEST++ suite. However when running programs of the PEST_HP suite, the agent is a separate program. Substitute the following command for the second of the above commands in this case.

agent_hp case /h manager_address:4004

The number of the port is arbitrary. Just make sure that you do not invoke a port that is being used by another program. Port 4004 is a safe bet. To see which ports are open on Windows type the command:

netstat -aon

Where the manager and all agents are working on the same machine, the easiest way to start the agents is to substitute the environment variable %computername% for manager_address in the above command. This invokes the local machine's host name. If you use the host name instead of the IP address, then the computer does not need to be connected to the internet.

It is important to note that when starting an agent, it is the manager's IP address which must feature on its command line. Agents need to know where the manager is running. The manager does not need to know where agents are running.