Intelligent Speculation for Pipelined Multithreading

Submitted by matt on Thu, 03/13/2008 - 9:15pm.
03/19/2008 - 1:30pm
03/19/2008 - 2:30pm

Event: Electrical Engineering Faculty Candidate
http://www.ece.utexas.edu/seminars/series/ece-seminar-series/

Speaker: Neil Vaccharajani
Princeton University

Title: "Intelligent Speculation for Pipelined Multithreading"

Date: Wednesday, March 19, 2008

Time: 1:30 pm

Place: ACES 2.302

Host: Yale Patt

ABSTRACT:

In recent years, microprocessor manufacturers have shifted their focus from
single-core to multicore processors. To avoid burdening programmers with the
responsibility of parallelizing their applications, some researchers have
advocated automatic thread extraction. Within the scientific computing domain
automatic parallelization techniques have been successful, but in the general
purpose computing domain few, if any, techniques have achieved comparable
success.

Despite this, recent progress hints at mechanisms to unlock parallelism from
general purpose applications. In particular, two promising proposals exist in
the literature. The first, a group of techniques loosely classified as thread-
level speculation (TLS), attempts to adapt techniques successful in the
scientific domain, such as DOALL and DOACROSS parallelization, to the general
purpose domain by using speculation to overcome complex control flow and data
access patterns not easily analyzed statically. The second, a non-speculative
technique called Decoupled Software Pipelining, partitions loops into long-
running, fine-grained threads organized into a pipeline (pipelined
multithreading or PMT). DSWP effectively extends the reach of conventional
software pipelining to codes with complex control flow and variable latency
operations.

Unfortunately, both techniques suffer key limitations. TLS techniques either
suffer from over speculation, in an attempt to speculatively transform a loop
into a DOALL loop, or realize little parallelism in practice because DOACROSS
parallelization puts core-to-core communication latency on the critical path.
DSWP avoids these pitfalls with its pipeline organization and decoupled
execution using inter-core communication queues. However, its non-speculative
nature and restrictions needed to ensure a pipeline organization prevent DSWP
from achieving balanced parallelism on many key application loops.

In this talk, I present two key contributions that advance the state of
automatic parallelization of general purpose applications. First, I propose
extending pipelined multithreaded execution with intelligent speculation.
Rather than speculating all loop-carried dependences to transform loops into
DOALL loops, I propose speculating only key predictable dependences that
inhibit balanced, pipelined execution. I will present results from our
automatic compiler transformation, Speculative DSWP, demonstrating the
efficacy of this technique. Second, to support decoupled speculative execution,
I will describe an extension to a multi-core architecture's memory subsystem
allowing it to support memory versioning. The proposed memory systems resemble
those present in TLS architectures, but provide efficient execution in the
presence of large transactions, many simultaneous outstanding transactions,
and eager data forwarding between uncommitted transactions. In addition to
supporting usage patterns exhibited by speculative pipelined multithreading,
the proposed memory system facilitates existing and future speculative
threading techniques.