Particle physics laboratory

LLNL scientists eagerly anticipate potential impact of El Capitan

As Lawrence Livermore National Laboratory eagerly awaits the arrival of its first exascale-class supercomputer, El Capitan, physicists and computer scientists running science applications on testbeds for the machine are getting a taste of what to expect. what to expect.

“I’m not exactly sure we understood exactly how much computing power [El Capitan] going to have, because it’s such a leap from what we have now,” said Brian Ryujin, a computer scientist with the Applications, Simulations, and Quality (ASQ) division of LLNL’s IT directorate. “I’m very interested to see what our users will do with it, because this machine is going to be just massive.”

Ryujin is one of the LLNL researchers who uses the third generation of Early Access Machines (EAS3) for El Capitan – Hewlett Packard Enterprise (HPE)/AMD systems with predecessor nodes to those that will make up El Capitan – to transfer the codes to the future exascale system. Although only a fraction of the size of El Capitan and containing older generation components, the EAS3 rzVernal, Tenaya and Tioga systems currently rank among the top 200 most powerful supercomputers in the world. All three contain HPE Cray EX235a accelerator blades with 3rd Gen 64-core AMD EPYC processors and AMD Instinct MI250X accelerators, identical nodes to what includes Oak Ridge National Laboratory’s Frontier system that holds the top spot. on the Top500 list and the title of the first exascale system in the world.

By incorporating next-generation processors – including AMD’s industry-leading MI300a Accelerated Processing Units (APUs) – and thousands more nodes than EAS3 machines, El Capitan promises more than 15 times the maximum computing capacity in average over LLNL’s current flagship supercomputer, the 125-petaflop IBM/NVIDIA Sierra, exceeding two exaFLOPs (2 quintillion calculations per second) at most.

“El Capitan has the potential to enable a more than 10x increase in problem throughput,” said Teresa Bailey, associate program director for computational physics in the Weapon Simulation and Computing program. “This will enable 3D ensembles, allowing LLNL to perform uncertainty quantification (UQ) and machine learning (ML) studies previously unimaginable.”

For months, Ryujin has been running Ares multi-physics code on EAS3 platforms, and if code performance to date is any indication, El Capitan’s advantages over Sierra could be nothing short of astronomical.

“Having a very healthy amount of memory gives us a lot more flexibility in how we run the calculations and really opens up the possibilities for larger, more complex multi-physics problems,” Ryujin said. “What’s really exciting is that we’re going to be able to ride El Capitan much more efficiently. I expect El Capitan to be used for big multi-physics problems, in addition to the kind of day-to-day calculations that we performed on Sierra.

LLNL Weapons and Complex Integration (WCI) computational physicist Aaron Skinner and ASQ computer scientist Tom Stitt used rzVernal to run MARBL, a multi-physics (hydrodynamic magneto-radiation) code focused on inertial confinement fusion and the science of pulsed power. As one of the newest codes for LLNL, researchers are developing many aspects of MARBL, including adding more modeling capabilities and making it work on El Capitan and other next-gen machines.

Skinner said he and other physicists perform a large number and variety of “very turbulent” physics calculations on rzVernal that are extremely sensitive to very small spatial scales, hence the possibility of increased resolution and higher dimensionality are highly desired. The extended memory of the EAS machine at least doubled the performance of MARBL compared to Sierra on a per-node basis. Additionally, the ability to “oversubscribe” GPUs (assign multiple tasks to a single GPU) resulted in additional performance increases, according to Stitt.

“The huge increase in available memory, which was a bottleneck on Sierra, and the more powerful GPUs are really exciting,” Stitt said. “Even though rzVernal is a small machine, it has eight times the memory per node, so we can run a very large experiment on a small number of nodes and get that allocation much easier. The running simulation is the highest resolution we’ve ever been able to do for this type of issue.”

Having a larger and faster Advanced Technology System (ATS) in El Capitan means that physicists, who traditionally used large sets of 1D and 2D calculations to train surrogate models, will now be able to create surrogates from large sets. 2D and 3D calculations. , expanding the design space and simulating physics to a degree they haven’t been able to achieve before, the researchers said.

“If a machine comes along that allows you to get 2.5 times better resolution for practically the same cost, then you can do a lot more science with the same amount of resources. That allows physicists to do a better job at that. they’re trying to do, and sometimes it opens doors that weren’t possible before,” Skinner explained.

Beyond 2D, El Capitan will also take the idea of ​​regularly running 3D multi-physics simulations from a “pipe dream” to reality, the researchers said. When the era of exascale arrives at LLNL, researchers will be able to model physics in a level of detail and realism never before possible, opening up new avenues of scientific discovery.

“As we really move into exascale, it’s no longer inconceivable that we could start making massive sets of 3D models,” Skinner said. “Physics really changes as you increase the dimensionality of models. There are physical phenomena, especially in turbulent flows, that just can’t be modeled properly in a lower-dimensional simulation; it really requires that three-dimensional aspect .

In addition to MARBL, Skinner, Stitt, and computational physicist/project manager Rob Rieben have recently used 80 of AMD MI250X GPUs on rzVernal to run a radio-hydrodynamic simulation who modeled a high energy density experience made at Omega laser installation. The researchers were impressed to find that the code, which was developed for Sierra, worked well on the EAS machine without any additional modifications. LLNL codes rely heavily on RAJA Portability Suite to achieve performance portability across various GPU and CPU-based systems, a strategy that gave them confidence in code portability for El Capitan.

With a background in astrophysics, star-forming clouds and supernovae, Skinner added that he was eager to use El Capitan, whose processors are designed to integrate with artificial intelligence (AI). and machine learning-assisted data analytics, to combine AI with simulation. a process that LLNL has dubbed “cognitive simulation”.

The technique could create more accurate and predictive surrogate models for complex multi-physics problems such as inertial confinement fusion (ICF) at the National Ignition Facility, which in 2021 set a record for fusion yield in an experiment and brought the world to the ignition threshold. In short, said Skinner, physicists will get better answers to their questions and potentially, in the case of ICF science, save millions of dollars on fabrication of fusion targets.

“What really makes me smile is getting the computer to act like something that can’t be easily experienced in the lab, and the more computing power you can give it, the more it behaves realistic,” Skinner said. “I’m really excited about the doors this will open and the new approaches to scientific discovery that are starting to be explored and made possible by machines like El Capitan that we couldn’t even imagine doing before. We’re entering a time where we no longer have to limit ourselves.

Ryujin, who said he was seeing a similar doubling or increase in node-to-node performance over Sierra with Ares code on rzVernal and Tenaya, said El Capitan will allow scientists to get a delay rapid turnaround on their modeling and simulation work. This will allow scientists to solve problems that consume huge amounts of resources and run orders of magnitude more simulations at once without interfering with other tasks, opening up new possibilities for uncertainty quantification, parameter studies, design exploration and model evaluations on large sets of experimentsshe added.

“The sheer size of the machine is going to be something to look forward to, both for throughput and the ability to solve huge problems,” Ryujin said. “Each generation of nodes gets significantly more powerful and more capable than the previous generation, so we’ll be able to perform the same calculations that we did before with far fewer resources. El Capitan is going to be a gigantic machine, and so this simulation work that was really huge before, will only require a small percentage of the machine.

Ryujin said scientists are eagerly calculating the scale of their El Capitan calculations and are excited by the prospects of researchers from the National Nuclear Security Administration’s three National Security Laboratories being able to model phenomena at resolutions they don’t. never could before.

“One of the joys of working in this space is being able to use these huge machines and cutting-edge technologies, so I think it’s also really cool to be able to build and develop and run these really advanced architectures,” Ryujin said. . “I look forward to performing calculations that will set records; I really like hearing from our physicists that we’ve done the greatest calculation ever, because every time we do that we learn something, it spawns something new in the program and sparks new directions of inquiry.

Bailey, who oversees the code development teams for El Capitan, said his goal was to develop useful computing capability for future users of the machine.

“The best part of my job is hearing when they use all of our high-performance computing capability, both machines and code, to solve a complex problem or learn something new about the underlying physics of systems. that they model,” she explained. “We are working hard now so that our users can make meaningful breakthroughs using El Capitan.”