Enabling High Job Throughput for Uncertainty Quantification on BG/Q

John Gyllenhaal, LLNL
Todd Gamblin, LLNL
Adam Bertsch, LLNL
Roy Musselman*, IBM

* presenting author

Traditionally, high-end supercomputers have been engineered for single, massively parallel grand challenge calculations. Recently, demand has grown for large ensembles of small-scale problems as part of parameter sensitivity studies. In the DOE, this work is mainly in the area of Uncertainty Quantification (UQ). UQ users of Sequoia, LLNL’s IBM Blue Gene/Q system, need to run a massive number of 8-node, 8-process, or even single-process jobs. This stresses large machines in ways not originally intended. In particular, the management system is forced to track many small job allocations. Current resource managers run a process on the management node to track each active parallel job. On a system of Sequoia’s size, (1.5 million cores), this can easily require tens of thousands of manager processes and exceed the limits of the Linux OS running on the front-end.

In this presentation, we describe techniques that allow UQ users to efficiently utilize all 96K Sequoia nodes. We describe hardware and software challenges we have overcome to allow resource managers to run more front-end processes. We also describe a new tool called “Cram”, which allows many concurrent and independent MPI jobs to run within a single MPI job. Together, these approaches have allowed us to successfully run millions of concurrent, small MPI jobs on the Sequoia machine, without placing undue burden on its service nodes.

[download slides]