Modeling Performance and Energy-Efficiency of Multi-Cores: The Cache-Aware Roofline Approach and The Intel Advisor
As architectures evolve towards more complex multi-core designs, deciding what optimizations provide the best trade-off between performance and efficiency is becoming a prominent issue. To help in this decision process, a set of fundamental Cache-aware Roofline Models (CARMs) are presented in this tutorial, which allow characterizing the upper bounds of contemporary parallel architectures for performance, power, energy and energy-efficiency (i.e., multi-core CPU and GPU architectures). These models evaluate how key microarchitectural aspects, such as accessing different functional units or different memory hierarchy levels, affect the attainable performance, power and energy-efficiency. Recently, the performance CARM was integrated by Intel as a fully supported feature into their proprietary Intel Advisor software tool, and it is described as “an incredibly useful diagnosis tool (...) that developers can use to guide them (in the application optimization process), ensuring that they can squeeze the maximum performance out of their code with minimal time and effort''. The proposed models are also rigorously validated on different CPU and GPU architectures by relying on hardware counters and specifically developed performance/power monitoring tools. Experimental results show a very high accuracy of the proposed models, and their ability to provide more intuitive and useful guidelines than the state-of-the-art approaches, when characterizing real-world applications from standard benchmark suites.