Building the world's biggest telescope array - with machines that don't yet exist
Turning terabytes and exabytes into galaxies at SKA
Once completed, the Square Kilometre Array (SKA) will be the biggest radio astronomy telescope in the world.
"Biggest", though, really is too mild a term for the sheer size of this project. The first phase, SKA1, will be broken up into two instruments, SKA1 MID and SKA1 LOW, based on their frequencies.
SKA1 MID alone is made up of 200 or so dishes spread over a 33,000m2 area – the size of 126 tennis courts. Those radio antennas will pick up a total raw data output of 2TB per second - 62EB a year - or enough content to fill up 340,000 average-sized laptops each day.
That will make SKA1 MID five times more sensitive than the current best instrument in the world, the Karl G Jansky Very Large Array (JVLA), with four times the resolution and sixty times the survey speed.
The project ultimately has two big objectives: one is to look for evidence of gravitational waves by observing a network of stable pulsar stars. The other is to look back at the period of time in the universe when the first stars and galaxies "turned on" and started shining brightly by peeking through holes in hydrogen gas for information about how galaxies are formed. Both works are potentially Nobel-Prize winning affairs.
The idea for SKA formalised in 1993 and while construction on the SKA sites in South Africa and Western Australia won’t start until 2018, SKA architect Tim Cornwell and his team are already busy developing the IT that will power this awesomely data- and compute-heavy project.
And, they are doing so using systems the tech suppliers haven’t even built yet.
Gazing at the cosmos through the power of dreams (and servers)
“We know that according to the current manufacturers’ development paths, around about the time we need it, we’ll be able to buy the requisite compute power,” Cornwell told The Register, rather nonchalantly during a recent interview.
“And it’ll be fairly conventional; it’ll be blade servers arranged in racks and a few of the racks will be tightly connected with loose connections to other racks in compute islands.”
The type of hardware might be conventional, but the computing power will need to be around three times more powerful than the most powerful supercomputer in 2013, equivalent to the processing power of around a hundred million PCs or more than 100 petaflops of raw processing power.
For the record, the most powerful super of 2013 is, officially, the National Super Computer Center in Guangzhou's Tianhe-2 (MilkyWay-2) Intel Xeon E5-2692 2.200GHz cluster running 3,120,000 cores. The SKA will also employ millions of CPU processors operating in parallel. It topped the Supercomputer 500 list three times last year.
The Netherlands Institute for Radio Astronomy (ASTRON) and IBM are in the middle of a five-year collaboration to research the fast, low-power exascale computer systems that SKA will need.
The partnership is studying exascale computing, data transport and storage processes as well as the streaming analytics that will be needed to read, store and analyse the raw data. Some of their ideas for the beyond-state-of-the-art supercomputing include novel optical interconnect technologies and nanophotonics for large data transfers and high performance storage systems based on next-gen tape systems and new phase-change memory technologies.
Yet, we're told, building data processing centres in Perth and in Cape Town with bits that don’t exist yet isn’t the hard part – it’s the software that’s the real head-scratcher.
Writing software for something that doesn't exist. Uh, what?
“We know what we would do, but writing the software to facilitate this is quite tricky,” said Cornwell. “It’s about an 80m euro job to do that and it’s quite a substantial job. Software is always a risk so I would say that’s one of the biggest challenges.
The SKA: 126 tennis courts, if they looked like radio arrays
“The processing that we do is extremely complex. We get the information from the telescope and it tells us in a very roundabout way what the sky looks like and we have to go through that and figure out what the sky actually looks like for the images,” Cornwell explained.
“That’s a huge amount of processing to do. We’ve gotten very good at that over the past 40 years or so, but the algorithms are very complex and we’re basically going about a factor of a thousand up in processing scale from existing telescopes,” he continued. “The two telescopes that do this already are the Australian Square Kilometre Array Pathfinder [ASKAP] and the Low Frequency Array [LOFAR], and they operate at about a few hundred teraflops and we have to operate at a hundred or a few hundred petaflops – so it’s a huge jump in five years. No-one really knows exactly how to write the software to do this - we have an idea, but it has to be tested.”
Despite the challenges though, Cornwell is feeling pretty confident, because this is what global science projects are all about.
“We’ve done this before – we’ve been doing it for years,” he said.
“I worked on ASKAP and I remember when I started the programme in 2007, thinking: ‘We’re not going to be able to do that,’ but we did it. The way you do it is incrementally, step by step as the processors get better. So I think it can be done, obviously we wouldn’t be able to attract the funding and the support if we couldn’t make a convincing case that we can. But it’s not a done deal.
“We design the telescope for the computers we’re going to have when we start observing, not the ones we have now. We’re reasonably optimistic that we can do it, but there will undoubtedly be stumbles along the way. But that’s what research is about; if you want to build a world class instrument – and this is basically the best radio telescope anyone’s going to build for a while – you have to take these educated chances,” he said.
To build something like this, Cornwell and the SKA team work with the teams of radio astronomy projects like ASKAP and LOFAR and even pull inspiration from other big data science projects like the Large Hadron Collider (LHC).
“We very much looked at other big projects for multiple lines of insight. One is the LHC and the way that they do their data dissemination. They have multiple tiers of users and we’ll probably end up with something like that,” he said.
“We also looked at the Large Synoptic Survey Telescope (LSST) and we’ve also talked to LOFAR, who are part of the team developing the software for our telescope. You really can’t do something like this unless you’re connected to other state of the art projects.”
Exascale kit and vendors
SKA is also lucky because it’s quite a sexy project for vendors to be involved in. Aside from the science-benefitting-mankind bit, SKA will be the biggest data-led project in the world - when it gets off the ground.
“I’ve been to many exascale meetings in the last few years and the SKA has always been singled out as having massive data flows and massive compute and it’s that combination that is quite unique. I believe that’s why vendors of computers and cloud services are so interested in us, because we’re pushing both of those envelopes at the same time,” Cornwell said.
Although the team has their big budget to build the specific software and algorithms they need for their data, they’ll also be making use of existing software like Hadoop and Swift to help them handle their mountains of raw information. Cornwell is especially keen on getting some of that data and computing into the cloud.
“In the long term, you can see that astronomers don’t really want to be running data centres and so sometime between now and 20 years will be a transition point to the cloud,” he reckons.
Cornwell has been talking to Amazon about the possibility of using its AWS for data storage and computing, as well as helping the team get the scientific discoveries out to other astrophysicists and astronomers.
“In the near future, we expect to announce a grants programme with Amazon and that will facilitate use of the cloud by astronomers around the world to get hands on experience,” Cornwell said.
Yet even with something like AWS, the full SKA will still dump zettabytes of raw data per year that it simply can’t afford to store. With an operations budget of 60m euro a year divided between people, power and storage what can be retained will be governed by raw money.
“We will have to throw away all the raw data, all the stuff that we sent to the supercomputer, after we go through a lot of processing and averaging the data to produce images and those images are the things that we would keep. Even in some cases, we can’t afford to keep the whole image, we have to keep just little parts of it,” he said.
That’s the situation for now, but Cornwell is confident that will change in the future. He still believes in Moore’s law for compute power and he reckons bandwidth will expand too.
“We re-invest on a five-, ten-year timescale. We end up putting new digital signal processing in, more bandwidth and then that normally brings you a huge new set of capabilities. One of which might be the ability to image the piece of the sky you’re looking at every millisecond. We can’t afford to do that at the moment, but it is quite conceivable that in ten years time, we could.”
Even without the boost in power, the SKA is set to make huge strides in astronomy. ®