RA2: Orchestration
RA2 Orchestration, coordinated by Manuel Giffels (KIT), aims to significantly advance the energy-efficient operation of large-scale scientific computing infrastructures within SUSFECIT. Building on previous experience in optimizing compute resource usage, the project introduces the concept of a “breathing compute center.” This approach provides a stable baseline of pledged computing resources while dynamically scaling additional capacity up or down depending on the availability of renewable energy, electricity market conditions, and grid balancing requirements. By aligning compute operations with forecasts of solar and wind energy, the infrastructure can expand during periods of abundant green power and contract when renewable availability is limited, thereby reducing CO2 emissions and improving overall energy efficiency. Renewable energy forecasts developed in RA1 are integrated into the COBalD/TARDIS resource meta-scheduler to enable automated, energy-aware resource scheduling decisions.
To further enhance flexibility, RA2 explores checkpoint–restore technologies for containerized workloads. This enables compute jobs to be paused and resumed later or at different sites, allowing dynamic reactions to changing weather or energy conditions and supporting the opportunistic use of renewable power across a federated infrastructure. In parallel, a new resource management module regulates opportunistic resources over longer time frames, enabling clusters with fixed quarterly or annual compute quotas to distribute their available capacity more evenly while accounting for fluctuating energy supply and demand. In addition to dynamic scaling of compute nodes, RA2 investigates the optimization of CPU power consumption through adaptive clock frequency tuning. Since CPU energy usage scales with clock speed, reducing frequencies during periods of limited green energy offers further potential for lowering CO2 emissions. Benchmark studies identify optimal operating points, and a dedicated system daemon adjusts clock frequencies based on energy forecasts. To ensure transparent and fair reporting, dynamic performance variations are integrated into the accounting framework (AUDITOR) developed in RA3. Overall, RA2 establishes a coordinated, energy-aware orchestration framework that tightly integrates forecasting, resource management, and accounting. Involved Partners with funding: University Bonn, DESY, University Freiburg, University Göttingen, and Karlsruhe Institute of Technology.
