Making a Case for Distributed File Systems at Exascales [slides]

by Dr. Ioan Raicu, Assistant Professor, Department of Computer Science, Institute of Technology University

Time: 1:45pm - 2:45pm on April 27, 2011
Place: Room 101, Lindley Hall

Talk Abstract

Exascale computers will enable the unraveling of significant scientific mysteries. Predictions are that 2019 will be the year of exascales, with millions of compute nodes and billions of threads of execution. The current state-of-the-art storage (but decades-old approach) in High-End Computing (HEC), in which storage is segregated from compute nodes and connected by a network, will not scale with the expected exponential growth in concurrency. At exascales, basic functionality at high concurrency levels will suffer poor performance, and combined with system mean-time-to-failure in hours, will lead to a performance collapse for heroic applications. Storage has the potential to be the Achilles heel of exascale systems. We propose that future HEC systems be designed with non-volatile memory on every compute node. Every compute node would actively participate in the metadata and data management, leveraging many-core processors high bisection bandwidth in torus networks. Distributed metadata management would be used, implemented in a distributed data-structure, tailored for HEC, supporting constant time operations by emphasizing trustworthy/reliable hardware, fast network interconnects, non-existent node "churn", low latencies, and scientific computing data-access patterns. The data would be partitioned and spread out over many nodes based on the data access patterns. Replication would be used to ensure data availability, and cooperative caching would deliver high aggregate throughput. Data would be indexed, by including descriptive, provenance, and system metadata on each file. There would be a variety of data-access semantics, from POSIX-like interfaces for generality, to relaxed semantics for increased scalability. This talk discusses this revolutionary new storage architecture that will make exascale computing more tractable, touching all disciplines in HEC, fueling scientific discovery and global economic development. This new architecture will extend the knowledgebase beyond HEC into commodity systems as the fastest machines generally become mainstream systems in a matter of years.

About Dr. Ioan Raicu

Dr. Ioan Raicu is an assistant professor in the Department of Computer Science (CS) at Illinois Institute of Technology (IIT), as well as a guest research faculty in the Math and Computer Science Division (MCS) at Argonne National Laboratory (ANL). He is also the founder and director of the Data-Intensive Distributed Systems Laboratory (DataSys) at IIT. He has received the prestigious NSF CAREER award (2011 - 2015) for his innovative work on distributed file systems for exascale computing. He was a NSF/CRA Computation Innovation Fellow at Northwestern University in 2009 - 2010, and obtained his Ph.D. in Computer Science from University of Chicago under the guidance of Dr. Ian Foster in March 2009. He is a 3-year award winner of the GSRP Fellowship from NASA Ames Research Center.

His research work and interests are in the general area of distributed systems. His work focuses on a relatively new paradigm of Many-Task Computing (MTC), which aims to bridge the gap between two predominant paradigms from distributed systems, High-Throughput Computing (HTC) and High-Performance Computing (HPC). His work has focused on defining and exploring both the theory and practical aspects of realizing MTC across a wide range of large-scale distributed systems. He is particularly interested in resource management in large scale distributed systems with a focus on many-task computing, data intensive computing, cloud computing, grid computing, and many-core computing. His work has been funded by the NASA Ames Research Center, DOE Office of Advanced Scientific Computing Research, the NSF/CRA CIFellows program, and the NSF CAREER program. More information can be found at and