Smart memories: A configurable processor architecture for high productivity parallel programming
Preprint, 2005
Abstract
With single processor systems running into instruction-level parallelism (ILP) limits and fundamental VLSI constraints, multiprocessor chips provide a realistic path towards scalable performance by allowing one to take advantage of thread-level (TLP) and data-level parallelism (DLP) in emerging applications. Nevertheless, parallel architectures are limited by the difficulty of parallel application development. This challenge has led to the invention of new programming models to simplify the way in which parallel programs are developed correctly and efficiently. Smart Memories is a scalable, hierarchical architecture which, using a modular design, addresses the process technology issues, such as power consumption and wire latency. Its reconfigurability allows executing applications described in different programming models with high performance. Simulations have shown that considerable speed ups (2x to 10x) can be achieved over a broad range of applications, while a small amount of power and area penalty is tolerated for reconfiguration.