Revealed: How NASA saved the Kepler space telescope from suicide
And you think you’ve had tough remote support jobs
Waking up to a phone call in the wee small hours of the morning is almost never good. It’s usually a wrong number, a drunk ex wanting to talk, or the news that someone has died.
When NASA's Kepler space telescope mission manager Charlie Sobeck was woken by ringing at 0125 on April 8, it was close to the latter. The Kepler probe was one step from failure and NASA was rounding up its staff to perform a week-long session of multi-million-mile IT support.
The 2,320lb (1,052kg) telescope is orbiting the Sun 70 million miles from Earth, and for the past seven years has been scanning the skies for exoplanets orbiting other stars or wandering alone through the heavens.
The telescope was being reorientated for the K2 mission, which is looking for supernovas and star formations. In March, NASA commanded the telescope to shift its position for a series of shots using its 95 megapixel camera and prepared to receive data.
But on that fateful Friday last month, NASA found out that its distant instrument was stuck in emergency mode – the last stage before final shutdown. Its main computers were offline, no data was being recorded, and the spacecraft was burning fuel at a prodigious rate.
"As I recall, I received a call from the mission director at Ames, Marcie Smith, at 1:25 a.m. Friday morning. Knowing that there was a planned spacecraft contact, I expected that she would tell me that the spacecraft point was just a bit off, and we’d have to give it a nudge," Sobeck said.
"Instead I heard, 'We’re in Emergency Mode.' Within two minutes we confirmed what steps should be taken, and what resources needed to be immediately brought in, and that the flight team in Boulder had already begun the recovery actions. I headed into the office.”
Kepler is run by a joint team at NASA, manufacturers Ball Aerospace, and the Laboratory for Atmospheric and Space Physics (LASP) at the University of Colorado. By the time Sobeck got into the office that night everyone was either there or on the end of a phone line, and the team scarcely left the office for the next three days.
One step from disaster
The software running Kepler has a number of modes, including a safe mode if something goes wrong. But there’s also an emergency mode, never before activated on Kepler, that only kicks in if the ‘scope is in serious trouble.
Emergency mode is when the satellite thinks all of its key instruments are down. It shuts down its primary and secondary main computers, fires up two backup computers, and its thrusters put it into a slow spin designed to keep its solar panels orientated towards the Sun for maximum power.
You spin me right round baby, right round ... NASA's Kepler 'scope
That burns up the telescope’s limited fuel supplies and means Earth can only communicate with the onboard systems infrequently. All systems and instruments are switched off, apart from the thrusters, comms gear, and the primary and secondary backup computers, which are less able than the main systems.
The first stage was to reestablish contact with the telescope using the Deep Space Network of communications antennas in the US, Spain, and Australia. Time on the powerful radio network must be booked weeks in advance, but NASA declared a “space emergency” to get immediate access.
"We do not declare a spacecraft emergency when the spacecraft merely goes into Safe Mode, or if we simply don’t know what is going on," Sobeck said. "We use the spacecraft emergency card only when we truly believe the loss of the spacecraft is imminent without it."
Over the next three days, the Deep Space Network was focused on Kepler for about 20 hours a day. Kepler was rotating once every two hours, and its communications antenna could only maintain contact with the network back on Earth for twenty minutes per rotation.
In the time between contact, NASA needed the antenna array to communicate with its other satellites, giving the Kepler team a chance to get some sleep.
Suddenly, a signal
The first lot of data picked up by the Deep Space Network from the probe suggested the thrusters, the main communications hardware, and the telescope’s two remaining reaction wheels were dead. The telemetry was sent by Kepler's emergency radio gear.
That sounds bad but, in fact, it was good news. It was highly unlikely that all of those systems had been damaged or shut down – only a catastrophic accident could have caused that, and that would have knocked out the communications systems completely. The more likely situation was that the data coming from the probe's individual subsystems was faulty.
The team established that emergency mode had kicked in about 30 hours before that fateful early morning call on April 8, before the telescope was due to reorientate itself. That told them the uploaded maneuvering commands hadn’t borked the system.
The next step was to get the telescope into safe mode. This brings the primary and secondary main computers back online but keeps the instruments shut down to minimize power use. Then they stopped the spin and got a constant feed of data to and from the spacecraft, which speeded things up dramatically.
After rebooting the main computers the readings from the thrusters, wheels and instruments showed all systems were normal. This confirmed the theory of a monitoring software fault, and it appears the telescope fooled itself into emergency mode.
"We don’t yet know what spawned the problem, and we may never know, but the first effects that we’ve found were a sudden series of alarms that caused the onboard fault protection to react," Sobeck said.
"The alarms themselves seem to be erroneous. As a result, the spacecraft’s response didn’t address the real situation, only the situation that was reported. In such conditions the resulting actions can, and this case were, detrimental rather than helpful."
The incident has now been classed as a "transitory event" – meaning it’s an unknown problem that sorted itself out. It's the opposite to a "hard failure," meaning some component had blown out and can’t be recovered. But while Kepler is back online, its future is limited.
Squeezing out the last few years
Sobeck said initial fuel readings showed that the telescope’s emergency maneuvers had expended a lot of propellant. "We lost more fuel than I had hoped, but less than I had feared," he said.
Fuel pressure in the tanks was uneven, he said, so it would take months of measurements before the final fuel load can be accurately gauged. The telescope is unlikely to finish its K2 star scan before the last of its fuel is exhausted and it becomes impossible to use the instrument.
Kepler has had a damned good run since it was launched into space in 2009. It was designed to last three and a half years but held up reasonably well and has been thrifty with its thruster fuel. In 2012 one of the four reactions wheels used to keep it stable failed, and a second failed less than a year later. Even then NASA engineers found a way to keep it online.
After months of calculations, the boffins worked out that the two remaining reaction wheels could be balanced against the force of charged particles streaming from the Sun and hitting the spacecraft’s solar panels. It’s that kind of hackery that made the K2 mission possible.
The key to success in situations like this is to keep a clear head and be methodical, Sobeck said. Gather a team who knows what needs to be done and the order in which to do it.
"I was impressed with the commitment, which everyone on the team demonstrated, and the cool, thoughtful approach that was taken," he said. "I was impressed with everyone’s ability to help when they could, and to stay out of the way when they couldn’t."
Kepler will continue to run for as long as it can, scouting for the images of planets passing in front of stars. It's sobering to remember that until the 1980s we'd never even seen a planet outside of the Solar System. But Kepler has discovered hundreds, if not thousands of planets, and maybe someday we'll find one that's just like home. ®