Understanding sources of inefficiency in general-purpose chips

Rehan Hameed Stanford

Wajahat Qadeer

Megan Wachs

Omid Azizi

Alex Solomatnikov

Benjamin C Lee

Stephen Richardson

Christos Kozyrakis Stanford

Mark Horowitz Stanford

Communications of the ACM (CACM), 2011


Abstract

Scaling the performance of a power-limited processor requires decreasing the energy expended per instruction. This article quantifies the performance and energy overheads of a 720p HD H.264 encoder running on a general-purpose four-processor CMP system, then explores how broadly applicable and algorithm-specific hardware customizations can eliminate those overheads. The final customized CMP reaches the same performance as an ASIC solution, within three times its energy and in comparable area.