NVIDIA Chief Scientist Bill Dally gives a talk in Tsinghua Global Vision Lectures
Title: The Future of Computing is Parallel (Actually it is known to everyone :-))
About Bill Dally
At Stanford University, Dally has been a Professor of Computer Science since 1997 and Chairman of the Computer Science Department since 2005. Dally and his team developed the system architecture, network architecture, signaling, routing and synchronization technology that is found in most large parallel computers today. At Caltech he designed the MOSSIM Simulation Engine and the Torus Routing chip which pioneered “wormhole” routing and virtual-channel flow control. His group at MIT built the J-Machine and the M-Machine, experimental parallel computer systems that pioneered the separation of mechanism from programming models and demonstrated very low overhead synchronization and communication mechanisms. He is a cofounder of Velio Communications and Stream Processors, Inc. Dally is a Fellow of the American Academy of Arts & Sciences. He is also a Fellow of the IEEE and the ACM and has received the IEEE Seymour Cray Award and the ACM Maurice Wilkes award. He has published over 200 papers, holds over 50 issued patents, and is an author of the textbooks, Digital Systems Engineering and Principles and Practices of Interconnection Networks.
SANTA CLARA, CA –JANUARY 28, 2009 - NVIDIA Corporation today announced that Bill Dally, the chairman of Stanford University’s computer science department, will join the company as Chief Scientist and Vice President of NVIDIA Research. The company also announced that longtime Chief Scientist David Kirk has been appointed “NVIDIA Fellow.”
Main Points
- Single thread performance is no longer scaling
- Performance = Parallelism
- Efficiency = Locality
- Application have lots of both
- Machines need lots of cores (parallelism) and exposed storage hierachy (locality)
- A programming system must abstract this
- Reaching an ExaScale requires evolving throughput computing
- Agile memory
- Energy efficient cores and communication
- Efficient parallel mechanism
On Fault Tolerance
- Protect node memory
- Phase-Change Memory/Flash for checkpoint and scratch
- Protect logic
- Simply duplicate the logic (area is not limited in future)
- RAS
- ECC on all memory and links
- Self-checking and application-level checking
- Fast local checkpoints