All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links.
How do you build a Web site that can handle 100 million hits a day?
That was the question facing Robert Andrews, Netscape's webmaster, when I visited him last year. Way back then, Netscape's Web site was clocking 70 million hits a day - a figure that was increasing by 10 percent every month. Staggering figures, but Andrews wasn't worried. That's because Netscape had a secret trick for dealing with the increasing load.
Engineering for Netscape-like growth is in fact one of the biggest technical problems facing webmasters today. You can load up a Sun UltraSPARC or a Silicon Graphics WebForce server with 256 Mbytes of RAM, but keep raising the demand from incoming traffic and sooner or later your Web server is going to hit its limit.
The solution, of course, is to set up multiple Web servers. If one Web server can handle 10 million hits a day, then 10 Web servers should be able to deal with 100 million, right? Unfortunately, distributing the incoming requests between those 10 Web servers isn't that simple.
Until recently, the only trick available to most webmasters was a technique called DNS round robin. To make this technique work, you simply assign multiple IP addresses to a single DNS name. Microsoft, for instance, has set up 12 different IP addresses for its host www.msn.com. Each of those addresses points to a different physical computer. When you type "www.msn.com" in your browser, your computer is supposed to randomly pick one of Microsoft Network's 12 addresses to have it send packets.
The problem with DNS round robin isn't that it doesn't work, but that DNS translations are cached so that hundreds or thousands of computers all hit the same machine at the same time. One machine might end up taking 50 percent of the load.
Another problem with DNS round robin is that it doesn't take into account the real-time state of your Web servers. If one machine is attacked by a psychotic employee with a screwdriver, you would like to have incoming connections automatically sent to the remaining Web servers. But DNS load balancing doesn't do that, because the load-balancing decision is made by the client, not by the server.
Last summer, Cisco Systems introduced a new product that solves this problem. It's called LocalDirector. The system is based on a firewall that Cisco acquired when it bought Network Translations Inc. in December 1995. The box takes each incoming TCP connection and automatically assigns it to one of your Web servers, redirecting each incoming packet to the appropriate machine.
LocalDirector operates on the principle of network address translation, which is described in detail in RFC 1631.
The original purpose of network address translation (NAT) was to solve the problem of dwindling IP addresses. It allows an entire organization to sit behind one IP address (or a few) and have the packet headers translated automatically, without having to resort to network proxies or SOCKS.
A box that performs network address translation is a kind of high-performance router that understands the TCP/IP protocol. By design, an entire organization sits behind a NAT box. The outgoing packets get rewritten so they all appear to be coming from a single IP address - the IP address of the NAT box itself. When the return packets come to the NAT box from the outside world, the NAT box looks at the packets, figures out which machine on the internal network they are destined for, rewrites the headers, and sends them on down.
Shortly after NAT was invented, a few folks realized it would make a nifty firewall. Those guys started Network Translation Inc., which was acquired by Cisco a little more than a year ago.
NAT has a lot of advantages other than simply built, groovy firewalls and high-performance Web sites. With NAT, you can change your organization's Internet subnet without having to go around to each computer and change its address. That's handy if your company wants to change Internet providers and you discover that you can't take your old IP address with you.
Unfortunately, NAT can cause some problems, because you might think you have one IP address, but a computer you're connecting to thinks you have a different IP address. This can break protocols like Kerberos, which build the IP address into the security state. For most users, however, NAT is a powerful idea - as evidenced by LocalDirector.
LocalDirector has four different algorithms it can use to assign incoming TCP connections to the appropriate server. The first option is true load balancing: LocalDirector actually measures the response time of each computer and sends new connections to the machine that's running the fastest. Alternately, the machine can send new connections to the server with the least number of existing connections, to each machine in a round-robin fashion, or to each machine according to a formula you predefine. You can also tell the machine that a server is being taken out of commission, and then take it down without any interruption to the Web surfers.
LocalDirector is an impressive piece of technology, and it's already being used on a number of Web sites, including Excite, Viacom, AOL, GTE, AT&T, Wal-Mart, and Charles Schwab, says Bret Cunningham, LocalDirector's product manager.
While LocalDirector solves the immediate problem with server saturation, it doesn't solve the backend problem of database management. If your Web pages are based on data that is dynamically generated from a database server, you're going to need a single DBMS that can keep up with full demand, or else you'll need to use some sort of replicated server and give up on instantaneous updates to your customer's data.
The idea for LocalDirector actually came from Robert Andrews at Netscape. Unfortunately, when Netscape was building its Web site, LocalDirector wasn't available. Instead, Netscape came up with a different solution.
Each time you click the big "N" on your copy of Netscape Navigator, the program rolls a pair of electronic dice. Then, instead of connecting to the address , the browser actually connects to wwwN.netscape.com/, where "N" is some random number. Because the randomization is done in the Web browser itself, rather than in the domain-name system, this trick gave Netscape real load balancing - and it gave the company enough breathing room to build a Web site that could keep up with the Internet's demands.
It's a cute trick. Too bad they had to write their own Web browser to use it.