Software Bugs Bite Telephone And Other Firms
PITTSBURGH — AS computers run more and more businesses, companies will have to watch out for software bugs. In some critical processes, they may have to limit computer automation because software is never 100 percent bug-proof.These are the points that software-reliability experts are making as federal investigators sift through the evidence of the computer-generated glitch that knocked out local telephone service here in Pittsburgh and other United States cities a month ago. Last week an investigative team of the Federal Communications Commission (FCC) released an interim report on the breakdowns experienced by Pacific Telesis Group and Bell Atlantic Corporation. It found that crucial information about problems with the software and the associated signaling system were not passed on industrywide. "Information that was available had not been shared with the companies," says James Spurlock, an FCC spokesman. "We find it troubling." The agency plans to bring together telephone industry officials to find ways toward better information sharing. The incident shows how vulnerable critical systems are to software bugs. On June 26, a routine maintenance problem in Los Angeles triggered the software error, which proceeded to flood local call-routing computers with error messages. A few hours later, virtually the same thing occurred in Washington, D.C., knocking out some local phone service for nine hours and spreading the problem to Maryland, Virginia, and West Virginia. On July 1, Pittsburgh had a 6-1/2 hour breakdown and, the next day, a 2-1/2 hour event. Alerted to the problem, telephone engineers managed to avert a similar outage in San Francisco. The interim report implies that the problems might have been avoided or minimized had industry officials been fully informed about similar incidents in Japan, Sweden, and, on June 10, in Los Angeles. "The industry does not seem fully informed regarding each incident," the report says. "It appears key operations people at Bell Atlantic were not aware of the failure experienced by Pacific Bell on June 10." The glitch turned out to be three bits of software code - part of a minor modification made by DSC Communications of Plano, Texas. The company says the modification was so minor that it did not test the software code a customary 13 weeks. In the scheme of things, these outages were relatively minor. They didn't knock out long-distance or emergency 911 service. Only some local calls were affected. Other software problems have proved far more costly. For example: * In 1988, the USS Vincennes shot down an Iranian airliner, killing 290. The ship's computer software wasn't designed to continually update the altitude of the plane, causing confusion about its altitude. That led to the shooting. * A software glitch allowed a brand of radiation machine to administer huge doses of radiation to patients, killing four in 1986. * Faulty software corrupted thousands of financial transactions at the Bank of New York in 1985, forcing the bank to borrow some $24 billion overnight to balance its accounts. * When a software fault triggered a nine-hour breakdown in its long-distance service last year, AT&T lost lots of revenue. But "the real cost wasn't so much monetary," says company spokesman Paul Karoff. "It cost the company more in terms of tarnishing what had been ... 100 years of flawless service of the network." TESTING sophisticated software is much harder than testing a mechanical machine. It can contain millions of lines of computer code, covering so many variables that weeks or even months of testing can't cover all the contingencies. "There is no guaranteed solution to that problem," says Peter Neumann, principal scientist at SRI's computer science laboratory in Menlo Park, Calif. The computer industry has not taken full advantage of the fundamental mathematical and statistical analysis that is available, says Janet Dunham, director of the center for digital systems research at Research Triangle Institute in North Carolina. Nor has the industry put enough resources into testing software.