HPC Open Source Software Lacks Cohesion

Member Spotlight

Dr. Marc Snir is a parallel computing pioneer whose innovative work has advanced the elite supercomputing systems that drive scientific discovery. Director of the Mathematics and Computer Science Division at Argonne National Laboratory (ANL), Snir is also the Michael Faiman and Saburo Muroga Professor in the Department of Computer Science at the University of Illinois, Urbana-Champaign (UIUC), a department he led from 2001-2007.

An Argonne Distinguished Fellow, AAAS Fellow, ACM Fellow, and IEEE Fellow, Snir has published influential papers and given many presentations on computational complexity, parallel algorithms, parallel architectures, interconnection networks, parallel languages, libraries, and parallel programming environments.

What is your experience or background in HPC?
I have been working in HPC for 30 years or so. I was at IBM, and my research group contributed to several scalable parallel computing systems, including the IBM Blue Gene family of machines. I’ve been associated with NCSA since 2007 where I served as the lead software architect for the Blue Waters supercomputing system at the National Center for Supercomputing Applications (NCSA) at the University of Illinois, Urbana-Champaign (UIUC).

In addition, I was one of the principal developers of the Message Passing Interface (MPI). And I play a leadership role in the DOE-led effort to develop next-generation exascale systems.

Please tell us about ANL’s mission.
ANL is basically a pure science lab with about 3500 people. One of the strengths is HPC. ANL seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, ANL conducts leading-edge basic and applied scientific research in virtually every scientific discipline. ANL researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems. ANL is managed by the US Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the US.

Why is ANL participating in OpenHPC?
The Department of Energy (DoE) has always been a supporter of open source software development. Open source is excellent for sharing information and expertise, and for complete transparency.

But software development lacks cohesion. Much of the software that is running on HPC platforms currently is open source — that’s a good thing — the problem is that each group is developing independently of other groups and software is rarely comprehensively tested. So it’s always a large effort to make sure that the different pieces work with each other.

OpenHPC can be a good mechanism to make sure all the pieces of open source software in HPC fit well together. It’s an important initiative that can bring together the HPC open source software community. It can make sure that a full stack of HPC software is available in a useful manner to the user community.

ANL has participated in other groups, naturally, either pushing standards or making available software. Just one example is MPICH, a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. ANL has been involved with MPICH development for over two decades.

But OpenHPC has the opportunity to be all-encompassing, covering a full HPC software stack, and also be system agnostic, so it will run on anyone’s platform, assuming the relevant vendors are involved. Different groups have their own releases on their own platforms. It would be much easier if there was a distro that would make sure that different versions are up-to-date and compatible with each other. It would be much easier for application development, allowing cross-platform support.

What do you do on weekends?
Well, I can tell you what I’m doing right now. I’m in a park in Chicago playing on the playground with my granddaughter!