A resource constrained loop pipelining technique for perfectly-nested loop structures.
Aakre, Thor Davis
Zaky, Amr M.
MetadataShow full item record
This thesis presents a new technique for loop pipelining of perfectly-nested for-loop structures which is designed to optimize loop execution on VLIW machines. Previously implemented loop pipelining techniques provide limited performance because they explicitly include the constraints imposed by a loop's cyclic dependences in their loop pipelining process. Some loop pipelining techniques have also ignored the realistic constraint of finite resource availability in the creation of final pipelined execution schedules. The new approach presented in this thesis eliminates the problem of cyclic dependences by first applying a linear transformation to the nested loop index space to ensure a cycle-free innermost loop, which is then pipelined using modulo scheduling for a known set of resources. The transformation guarantees that the target machine's available resources are the only limit to the amount of exploitable fine-grained parallelism within the innermost loop. This results in pipelined execution schedules having near-optimal, Inter-Iteration Initiation Intervals (IMu) with the achievable performance being scalable with the addition of resources. Consequently. our loop pipelining method' utilizes more fine-grained parallelism than other loop pipelining techniques which directly incorporate a loop's cyclic dependences in their pipelinlng process. We also explicitly provide a procedure for creating the resultant pipelied execution schedules. In addition, we investigate the negative effect that the transformation has on data locality and the cache miss rate, as well as the use of iteration space tiling to restore data locality and cache miss rate to the levels expected from sequential loop execution.
Approved for public release; distribution is unlimited.
Showing items related by title, author, creator and subject.
Efficient scheduling of real-time compute-intensive periodic graphs on a large grain data flow multiprocessor Akin, Cem (Monterey, California: Naval Postgraduate School, 1993-03);Architectures of computer systems based on Data Flow (DF) concepts attracted great attention as an alternative to conventional sequential architectures (Von Neumann). DF architectures are capable of efficiently exploiting ...
Shukla, Shridhar B.; Little, Brian S.; Zaky, Amr (Monterey, California. Naval Postgraduate School, 1992-04); NPS-EC-92-007Efficient data-flow implementation requires fast run-time mechanisms to detect and dispatch schedulable tasks. However, the inherent non-determinism in data-flow executions and the requirement of fast, and therefore, simple ...
Keys, Richard Toney (Monterey, California. Naval Postgraduate School, 1994-09);The U.S. Navy's new multiprocessor, the AN/UYS-2 Enhanced Modular Signal Processor (EMSP) utilizes a First-Come-First-Serve (FCFS) algorithm to transfer data. This algorithm is simple to implement but provides no mechanism ...