Power and Thermal Management

Research Overview

Research Lead:  Tajana Simunic Rosing, Assistant Professor, Computer Science and Engineering, UC San Diego

We propose to 1) develop techniques for effective monitoring of workload/application characteristics, 2) design event-driven methods delivering optimal policies for energy and thermal management, composable into system-wide policies, and 3) provide feedback loops for evaluation of management policies by deploying various types of sensors for monitoring power consumption, temperature, and so on. The measurements will be correlated to workload currently running, so that we can give feedback to the users on their application energy costs. The control will be local to each hardware component, and will coordinate with the rest of the system hierarchy as shown in our early work. Adaptation to non-stationary workloads can be done using machine learning principles. 

We will adapt Sun's Continuous Telemetry Harness package in the BB components of the GreenLight Instrument to provide input for power and thermal management decisions and feedback on energy/thermal/performance costs. The package includes the hardware temperature, power and energy probes and software needed to capture and display readings yielding a high-resolution, dynamic "heat and power map" of a running system and enable high quality closed-loop thermal/power management control. Our initial results on managing thermal and power properties of multi-core processors show that temperature can be significantly lower and more balanced, with less power consumption and practically no impact on performance. Using estimators such as KF filters we can filter out decalibration errors in sensor readings. For more complex systems we'll use more sophisticated techniques such as Multivariate State Estimation Technique (MSET).

Our power management algorithm adapts really well to workload changes and achieves an overall performance comparable to the best performing management policy at any point in time, with energy savings as high as 58% and 92% for HDD and CPU respectively. If we go a step further and implement policies on Sun's multi-core Niagara processor that also aim to improve thermal issues, we can reduce the frequency of hot spots by 35%, spatial gradients by 85% and thermal cycles by 61% in comparison to minimizing only for energy.  Furthermore, our adaptive dynamic workload scheduling policies reduce the frequency of high-magnitude thermal cycles and spatial gradients by around 50% and 90% respectively, versus state-of-the-art load balancing schedulers

Technical White Papers

Power and Thermal Management 2011
Power and Thermal Management 2010
Power and Thermal Management 2009