The HPC Community Needs Standards

Member Spotlight

In November 2015 Altair announced its decision to open source PBS Professional, the workload manager and job scheduler of the PBS Works suite, which was awarded “Best HPC Software or Technology” in 2014’s HPCwire Reader’s Choice Awards. The decision involves working with OpenHPC to integrate PBS Professional with the OpenHPC software stack.

We had a chance to talk with Dr. Bill Nitzberg, CTO of the PBS Works division at Altair, and internationally recognized expert in parallel and distributed computing, about the who-what-where-and-why.

Can you tell me a little bit about your experience and background in HPC?
I worked at NASA starting in 1991. I was hired to be a sysadmin in the parallel systems group, where we had an Intel iPSC/860 system with 128 nodes and a CM-2, and later an Intel Paragon and CM-5, and… My official bio is: I’ve been in the computing industry for over 25 years.

While I was working at NASA, I finished my PhD, started managing the parallel systems group, and took on a few different research projects. One of those projects was on cluster computing. Our “Whitney” project was a follow-on to Tom Sterling and Don Becker’s “Beowulf” work, with a focus on scaling it up. We also worked on the MPI standard, where I was the editor of the MPI-2 I/O specification.

One of my projects was PBS – the workload manager that I’m now the CTO for within Altair. In 2000, the core developers and I took PBS out of NASA, formed a company around it, and that’s when PBS Pro was born. That company was acquired by Altair shortly thereafter, along with the original developers, many of whom are still here.

How does building an HPC community, and a community stack, help Altair and the PBS Works customers?
It feels like we, the HPC community, have continually reinvented a lot of high-performance computing. In a few places, like MPI, we have had really good success, but, in a lot of other places, we haven’t. We keep reinventing things. When I think about where we’ve done well (and I’m including myself in the HPC community) it’s where we have created standards.

There’s been no agreement about “Here’s what you should do in order to build an HPC system.” By creating a community software stack, I think we move further along towards real standardization. We can say, “Here’s something that works really well, here are the pieces, and they work well together. You have some choices, but we’ve done a good job of making sure that the choices fit together.”

You can stand on the shoulders of people who have come before you and actually improve things as opposed to reinventing them. The stack gets us much further along. As you’re deploying a new machine you can easily deploy the stack without having to reinvent a lot of stuff.

How are you contributing to OpenHPC and why do you think this is important?
In the workload-management space there are two big camps – the public sector and the private sector. The public sector – researchers, academia, and government labs – have a really strong preference for open source. They’re pretty aggressive in terms of how much risk they’re willing to take, and pretty aggressive in wanting to try out new things. And so, open source is a really good fit for them.

The private sector – large enterprises – have a really strong preference for a commercial product, a tool to “get the job done.” Indemnification is important, someone to talk to, someone who’s as big as they are in case there is a problem. They don’t want to be dependent on a nebulous community, because there may be nobody there when something goes wrong. And PBS Works has really solidified itself as a leader in the private sector and is used by thousands of organizations.

Because of the strong preferences (for open source vs for commercial software), it’s been very challenging to support both sectors in a way that allows innovations and expertise to flow back and forth, and this has been a big lost opportunity for the HPC community. We’ve announced an open source version of PBS Pro, available this summer, to better serve the entire HPC community. It’s going to have exactly the same core and all the bells and whistles and functionality that we’re also going to continue to offer commercially. Our vision is to maintain a single PBS Pro that is attractive to both sectors.

By bringing things together we will be able to pull innovations from the public sector, and hand them over in a palatable way to the private sector.

And the commercial sector has all this great “enterprise stuff” that rarely makes it into research labs. We can actually bring some of that into the public sector too.

I’m really excited about our strategy with PBS Pro – both making it available as open source, and making it an option in OpenHPC. So when you get OpenHPC, you can get PBS. And if you want OpenHPC with a commercial version of PBS, you can get that, too.

What do you do on weekends? I’m assuming you have weekends.
I’ve been running a lot lately. One of the things that I did in my early days at NASA was run the San Francisco Marathon about 20 years ago. I just ran it for the second time this past year, beating my original (pretty slow) time. (I plan to try again in 20 years with a personal goal of beating my time again.)

During the summers I do a lot of hiking.

And, by the way, my tagline about “reducing my pack weight,” is really an optimization problem. It feels natural working at Altair – an optimization-driven company – and working on PBS, which is all about optimization for HPC.