Understanding sources of inefficiency in general-purpose chips
Communications of the ACM (CACM), 2011
Abstract
Scaling the performance of a power-limited processor requires decreasing the energy expended per instruction. This article quantifies the performance and energy overheads of a 720p HD H.264 encoder running on a general-purpose four-processor CMP system, then explores how broadly applicable and algorithm-specific hardware customizations can eliminate those overheads. The final customized CMP reaches the same performance as an ASIC solution, within three times its energy and in comparable area.