Tuesday, April 29, 2008

UA Researchers Create Self-Healing Computer Systems for Spacecraft

University Communications

We’ve all heard about the space missions that are DOA when NASA engineers lose touch with the spacecraft or lander. In other cases, some critical system fails and the mission is compromised.

Both are maddening scenarios because the spacecraft probably could be easily fixed if engineers could just get their hands on the hardware for a few minutes.

Ali Akoglu and his students at The University of Arizona are working on hybrid hardware/software systems that one day might use machine intelligence to allow the spacecraft to heal themselves.

Akoglu, an assistant professor in electrical and computer engineering, is using Field Programmable Gate Arrays, or FPGA, to build these self-healing systems. FPGAs combine software and hardware to produce flexible systems that can be reconfigured at the chip level.

Because some of the hardware functions are carried out at the chip level, the software can be set up to mimic hardware. In this way, the FPGA “firmware” can be reconfigured to emulate different kinds of hardware.

Speed vs. Flexibility

Akoglu explains it this way: There are general-purpose systems, like your desktop computer, which can run a variety of applications. Unfortunately, even with 3 GHz, dual-core processors, they’re extremely slow compared with hardwired systems.

With hardwired systems, the hardware is specific to the purpose. As an example, engineers could build a very fast system that would run Microsoft Word but nothing else. It couldn’t run Excel or any other application. But it would be super fast at what it’s designed for.

“In that case, you have an extremely fast system, but it’s not adaptable,” Akoglu explained. “When new, and better software comes along, you have to go back into the design cycle and start building hardware from scratch.”

“What we need is something in the middle that is the best of both worlds, and that’s what I’m trying to come up with using Field Programmable Arrays,” he said.

Work on the self-healing systems began in 2006 as a project in Akoglu’s graduate-level class. His students presented a paper on the system and sparked interest from NASA, which eventually provided an $85,000 grant to pursue the work.

Akoglu and his students now are in the second phase of the project, which is called SCARS (Scalable Self-Configurable Architecture for Reusable Space Systems). The project is being carried out in collaboration with the Jet Propulsion Laboratory.

Currently, they are testing five hardware units that are linked together wirelessly. The units could represent a combination of five landers and rovers on Mars, for instance.

“When we create a test malfunction, we try to recover in two ways,” he explained. “First, the unit tries to heal itself at the node level by reprogramming the problem circuits.”

If that fails, the second step is for the unit to try to recover by employing redundant circuitry. But if the unit’s onboard resources can’t fix the problem, the network-level intelligence is alerted. In this case, another unit takes over the functions that were carried out by the broken unit.

“The second unit reconfigures itself so it can carry out both its own tasks and the critical tasks from the broken unit,” Akoglu explained.

If two units go down and can’t fix themselves, the three remaining units split up the tasks. All of this is done autonomously without human aid.

Lightning-Fast Processing

Because FPGAs can be programmed to carry on tasks simultaneously, they also can be configured to do lightning-fast processing.

“So if you’re running a loop, and it is running 10,000 times, you can replicate the loop as a processing element in the FPGA ‘n’ number of times,” Akoglu explained. “That means you have an ‘n’ times speed-up.” It’s like creating a huge multicore processor configured for a specific task.

FPGAs traditionally have been used for prototyping circuits because their firmware can be reprogrammed. Rather than creating costly circuits in hardware, engineers can test their ideas quickly and inexpensively in FPGA firmware.

In the past five years, the amount of circuitry that can be crammed into FPGAs has increased dramatically, promoting them from simple test-beds to end products in themselves, Akoglu explained.

The Ridgetop Group, a Tucson company that specializes in diagnosing circuit faults using statistical methods, now is working with Akoglu on the self-healing systems.

“This is the next phase of our project,” Akoglu said. “Our objective is to go beyond predicting a fault to using a self-healing system to fix the predicted fault before it occurs.” This could lead to extremely stable computer systems that could operate for long periods without failure.