All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links.
Microsoft's websites were offline for up to 23 hours -- the most dramatic snafu to date on the Internet -- because of an equipment misconfiguration, the company says.
A series of problems centering around its collection of routers in Canyon Park, Wash. -- near the company's headquarters -- is what the company blames for knocking out dozens of Microsoft (MSFT) properties including hotmail.com and msn.com, frustrating millions of users and providing acute embarrassment for a company that is offering the promise of unprecedented reliability in marketing its Internet products.
"We screwed up. (Tuesday) night at around 6:30 p.m. Pacific time we made a configuration change to the routers on the DNS network," spokesman Adam Sohn said Wednesday evening.
The company said in a statement that it took nearly a day to determine what was wrong and undo the changes.
Microsoft's sites -- including microsoft.com, slate.com, expedia.com and msnbc.com -- started to work properly again at about 4:30 p.m., PST, Wednesday. Media Metrix reports that the combined properties, not including news sites, received 54 million unique visitors in December.
Technical experts blame Microsoft's design decisions for exacerbating its woes. All the affected Microsoft sites rely on just four Windows servers, located in the company's Canyon Park data center, to forward users to the right destination via the Domain Name System (DNS).
Because all four DNS servers -- which translate names like microsoft.com into its 207.46.230.218 numeric address -- share the same routers, all are vulnerable to hardware glitches or a technician's error.
"Sure, small organizations have their DNS servers located together and there's nothing wrong with that," says Rich Kulawiec, a consultant with 20 years of networking experience. "But national or global organizations should, as standard operating procedure, have their DNS servers on different networks served by different ISPs and running on different operating systems -- Solaris and FreeBSD, or Linux and HPUX -- so as to minimize the threats for DoS attacks, known OS vulnerabilities, and connectivity issues."
Some companies already offer supra-reliable DNS to nervous customers worried about downtime. Nominium, a Redwood City, Calif. startup, boasts it has many collections of DNS servers, each with at least two different hardware and OS platforms, and each connected to two different ISPs.
"If an entire (Nominium) site fails, the other sites around the world would continue to serve customers' domain data," the company's white paper says. Ultradns.com offers a similar service.
"The problem that Microsoft is experiencing once again illustrates the fact that even if you are a technically competent organization, your business is at significant risk without a highly reliable DNS infrastructure," said William Thomas, president and CEO of Nominum, Inc.
Making matters worse for Microsoft's frantic technicians was that they were racing against time: For efficiency's sake, ISPs, corporations and universities keep caches of the numeric IP addresses of frequently-visited sites. But caches began to expire at different times across the Internet yesterday, which meant Microsoft's properties began to fade, gradually, from public view.
"Anybody who knew anything about this and could be helpful dropped everything," Microsoft's Sohn said.
Sohn said: "We spent a lot of time diagnosing every single scenario we could think of: whether it was a denial of service attack, whether it was an (equipment malfunction). We had little mini-teams off triaging all of this stuff and eliminating scenarios as fast as we could."
Speculation about the underlying problem was rife on Wednesday, with many observers suspecting a cracker or distributed denial of service attack, which Microsoft reports did not happen. One Microsoft official even inadvisedly told the IDG news service that the Internet Corporation for Assigned Names and Numbers might be to blame.
Matt Drudge's drudgereport.com also got it wrong, saying in an article that malicious hackers left a "disjointed warning" behind in Network Solutions' whois database.
Drudge was referring to a list of hostnames such as "MICROSOFT.COM.AINT.WORTH.SHIT.KLUGE.ORG" and "MICROSOFT.COM.HACKED.BY.HACKSWARE.COM" that appear in a whois listing. But instead of a hacker's exultation, they're an off-color but technically legal use of the domain name system -- for kluge.org and hacksware.com in those examples -- and they've been in the database for months.
The failure -- the most serious in Microsoft's history -- comes as the company is launching a $200 million advertising campaign touting the reliability and performance of its "enterprise" products.
"With Windows 2000 and our family of enterprise servers, we can offer the rock solid reliability that enterprises need and depend on," Chris Atkinson, a Microsoft vice president, told CNNfn on Monday.
Microsoft CEO Steve Ballmer boasts in a recent statement that "our software and services can provide businesses of any size with unprecedented quality, speed and flexibility."
Microsoft says it doesn't think customers will shy away from its products because of the downtime. "Somebody made a configuration error," says Sohn. "That happens to a lot of people. It's unfortunate that it happened to us. But I don't think it calls into question the credibility of our products and services."
Microsoft is also betting its future on its ".NET" system, a set of building blocks such as passport.com that envisions the delivery of more services over the Internet. Its marketing literature says developers will be able to "create programs that transcend device boundaries and fully harness the connectivity of the Internet in their applications."
But when passport.com, which Microsoft properties rely on for user authentication, is unavailable -- it was on Wednesday -- .NET starts to look more tangled than not.
"One of the great things the PC revolution bought us is a lot of stability," says Peter Wayner, a columnist and author of Free For All, a book about free software. "If one mainframe went out, a whole company could go down. If one PC fails, it's just one person affected."
"Now we're moving back to a vision of one central point of failure. There are a lot of advantages -- you don't have to send tech support out to everyone's desk -- but there's a danger in everything getting knocked out by something as simple as a DNS server disappearing," Wayner says. "They want to put everything in one central spot. If that breaks, we're hosed."
One example: Even though hotmail.com used other DNS servers and wasn't by itself affected, the site was redirected to hotmail.passport.com for user logins and therefore couldn't be reached during the downtime Wednesday.
The outage also showed the risks of moving services, like error messages and search functionality, to the Net.
Normally when a website can't be reached, Internet Explorer defaults to auto.search.msn.com for search tips -- but that site was also offline, which left users without their usual ability to recover from typos or erratic sites.
Although the scope and timing may be uniquely unfortunate, Microsoft isn't alone in having problems. During recent eBay outages, users could view the site's home page but not individual item listings. Microsoft's Hotmail has been offline because of security problems, and last month, a series of glitches.
Even though Microsoft's Web servers were still working, for the most part, only users who knew the appropriate numeric IP addresses could get in.
Even that trick had limited utility, however, since Microsoft's sites used HTML links with the site name embedded in them. For instance, microsoft.com features absolute links to "microsoft.com/jobs" and other areas, instead of a relative "/jobs" link that would allow someone using the numeric IP address to continue browsing the site.
Mail to most Microsoft addresses continued to go through, since the DNS servers were still responding to "MX" mail-direction requests.
Other affected sites included zone.com, windowsmedia.com, encarta.com and carpoint.com.
Microsoft shares closed on Wednesday at 62 15/16, up two percent.