UT Computer Architecture Seminar: Fault Screeners

Submitted by matt on Fri, 03/21/2008 - 11:41pm.
03/27/2008 - 3:30pm
03/27/2008 - 5:00pm

Event: Computer Architecture Seminar Series
http://www.cs.utexas.edu/users/cart/arch

Speaker: Shubu Mukherjee
Intel

Title: "Fault Screeners"

Date: Thursday, March 27, 2008 **Please note that this is not the usual
seminar day.**

Time: 3:30 pm

Place: ACES 2.402

Host: Doug Burger

ABSTRACT:

Fault screeners are a new breed of fault identification technique that
can probabilistically detect if a transient fault has affected the state
of a processor. We demonstrate that fault screeners function because of
two key characteristics. First, we show that much of the intermediate
data generated by a program inherently falls within certain consistent
bounds. Second, we observe that these bounds are often violated by the
introduction of a fault. Thus, fault screeners can identify faults by
directly watching for data inconsistencies arising in an application's
behavior.

We present an idealized algorithm capable of identifying over 85% of
injected faults on the SpecInt suite and over 75% on average overall.
Further, in a realistic implementation on a simulated Pentium-III-like
processor, about half of the errors due to injected faults are
identified while still in speculative state. Errors detected this early
can be eliminated by a pipeline flush. In this talk, we present a
hardware-based version of this screening algorithm and show that its
implementation reduces overall performance by less than 1%.

BIOGRAPHY:

Shubu Mukherjee is a Principal Engineer and Director of Intel's SPEARS
Group (Simulation and Pathfinding of Efficient and Reliable Systems).
The SPEARS Group is responsible for spearheading architectural change
and innovation in the delivery of enterprise processors and chipsets by
building and supporting simulation and analytical models of performance,
power, and reliability. Dr. Mukherjee is widely recognized both within
and outside Intel as one of the experts on architecture design for soft
errors. He has made pioneering contributions towards the design of
Redundant Multithreading (RMT) techniques, architectural vulnerability
modeling for soft errors, creation of performance modeling
infrastructure called Asim (jointly with Dr. Joel Emer), design of the
Alpha 21364 interconnection network, and the creation of the first
shared memory prediction scheme.

Prior to joining Intel, Shubu worked in Compaq for 3 years and Digital
Equipment Corporation for 10 days. Dr. Mukherjee received his B.Tech.
from the Indian Institute of Technology, Kanpur and M.S. and PhD from
the University of Wisconsin-Madison. He was the General Chair of ASPLOS
(Architectural Support for Programming Languages and Operating Systems),
2004. He has co-authored over 40 external papers. He holds 8 patents and
has filed over 30 more in Intel. Dr. Mukherjee's book titled,
"Architecture Design for Soft Errors" just appeared in the market.

-------------------------------------------------------------------------------
The Computer Architecture Seminar Series is sponsored jointly by the
Departments of Computer Science and Electrical & Computer Engineering and is
supported by a grant from AMD.

-----------------------------------------------------------------------------

Scalable Yahoo Map of 24th & Speedway:
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=2400+Speedway&csz=Austin%...
X&Get+Map=Get+Map

Parking for off-campus visitors: We suggest that you park in the San Jacinto
parking garage (formerly PG1) at 24th & San Jacinto. Parking validation will
be available. Please contact the host for this seminar or stop by the
refreshment cart to have your parking validated.

Submap including San Jacinto Parking Garage:
http://www.utexas.edu/maps/main/areas/law.html

Submap including ACES: http://www.utexas.edu/maps/main/areas/eastmall.html

Please contact Gem Naivar at gem@cs.utexas.edu if you need any further
information to attend the seminar.