CERN’s LHC experiments increase the use of GPUs to improve computing infrastructure
February 4, 2022 – Analyzing up to a billion proton collisions per second or tens of thousands of highly complex lead collisions is no easy task for a traditional computer farm. With the latest upgrades to the LHC experiments due to come into action next year, their demand for data processing potential has increased dramatically. Since their new computing challenges might not be solved using traditional central processing units (CPUs), the big four experiments are adopting graphics processing units (GPUs).
GPUs are high-performance processors that specialize in image processing and were originally designed to speed up the rendering of three-dimensional computer graphics. Their use has been studied over the past two years by the LHC experiments, the Worldwide LHC Computing Grid (WLCG) and the CERN openlab. Increasing the use of GPUs in high-energy physics will not only improve the quality and size of computing infrastructure, but also overall energy efficiency.
“The ambitious LHC upgrade program poses a series of exciting computing challenges; GPUs can play an important role in supporting machine learning approaches to solving many of these problems,” says Enrica Porcari, Head of CERN Computing Department. “Since 2020, CERN Computing has provided access to GPU platforms in the data center, which have proven popular for a range of applications. In addition to this, CERN openlab is conducting significant research into the use of GPUs for machine learning through collaborative R&D projects with industry and the Scientific Computing Collaborations group is working to help port – and optimize – the experiments key code.
ALICE has pioneered the use of GPUs in its High Level Trigger (HLT) online computing farm since 2010 and is the only experiment to use them to such a large extent to date. The newly upgraded ALICE detector has over 12 billion electronic sensor elements that are continuously read, creating a data stream of over 3.5 terabytes per second. After processing the first-level data, a stream of up to 600 gigabytes per second remains. This data is analyzed online on a high-performance computer farm, using 250 nodes, each equipped with eight GPUs and two 32-core CPUs. Most of the software that assembles signals from individual particle detectors into particle trajectories (event reconstruction) has been adapted to run on GPUs.
In particular, GPU-based inline reconstruction and compression of time projection chamber data, which is the largest contributor to data size, allows ALICE to further reduce throughput to a maximum of 100 gigabytes per second. before writing data to disk. Without a GPU, approximately eight times as many servers of the same type and other resources would be needed to handle online processing of lead collision data at an interaction rate of 50 kHz.
ALICE successfully used online reconstruction on GPUs when taking LHC pilot beam data at the end of October 2021. When there is no beam in the LHC, the online computer battery is used for offline reconstruction. In order to exploit the full potential of GPUs, the full ALICE reconstruction software has been implemented with GPU support, and more than 80% of the reconstruction workload will be able to run on GPUs.
Beginning in 2013, LHCb researchers conducted R&D work on the use of parallel computing architectures, notably GPUs, to replace parts of the processing that traditionally occurred on CPUs. This work culminated in the Allen project, a first-level full real-time processing implemented entirely on GPU, which is able to handle the data rate of LHCb using only about 200 GPU cards. Allen enables LHCb to find charged particle trajectories early in real-time processing, which are used to reduce the data rate by a factor of 30 to 60 before the detector is aligned and calibrated and a full detector More comprehensive CPU-based rebuilding is performed. Such a compact system also leads to substantial energy savings.
From 2022, the LHCb experiment will process 4 terabytes of data per second in real time, selecting 10 gigabytes of the most interesting LHC collisions every second for physical analysis. LHCb’s unique approach is that instead of offloading the work, it will analyze the 30 million particle packet crossings per second on the GPUs.
Along with improvements to its CPU processing, LHCb has also gained nearly a factor of 20 in the energy efficiency of its detector reconstruction since 2018. LHCb researchers are now looking forward to bringing this new system into operation with the first data of 2022, and build on it. enable the full physical potential of the improved LHCb detector to be realized.
CMS reconstructed LHC collision data with GPUs for the first time during LHC pilot beams in October last year. During the first two runs of the LHC, the CMS HLT ran on a traditional computer farm comprising over 30,000 processor cores. However, as studies for the CMS Phase 2 upgrade have shown, the use of GPUs will help keep the cost, size, and power consumption of the HLT fleet in check at higher brightness. of the LHC. And in order to gain experience with a heterogeneous farm and the use of GPUs in a production environment, CMS will equip the whole HLT with GPUs from the start of Run 3: the new farm will consist of a total of 25,600 CPU cores and 400 GPUs.
The additional computing power provided by these GPUs will allow CMS not only to improve the quality of online reconstruction, but also to expand its physics program, running online data recognition analysis at a much faster rate. higher than before. Today, about 30% of HLT processing can be offloaded to GPUs: local calorimeter reconstruction, local pixel tracker reconstruction, pixel-only track and vertex reconstruction. The number of algorithms that can run on GPUs will increase during Run 3, as more components are already in development.
ATLAS is engaged in a variety of R&D projects towards the use of GPUs both in the online trigger system and more broadly in experimentation. GPUs are already used in many analyses; they are particularly useful for machine learning applications where training can be done much faster. Outside of machine learning, ATLAS’s R&D efforts have focused on improving the software infrastructure to be able to use GPUs or other more exotic processors that may become available in a few years. A few complete applications, including a fast calorimeter simulation, also now running on GPU, which will provide the key examples to test the infrastructure improvements.
“All of these developments are occurring against a backdrop of unprecedented evolution and diversification in computing hardware. The skills and techniques developed by CERN researchers while learning to make the best use of GPUs provide the ideal platform from which to master the architectures of tomorrow and use them to maximize the physical potential of current and future experiments,” says Vladimir Gligorov, who leads the LHCb Real project. Time analysis project.