Can Caching Tame the Web?

Web caching would store popular pages closer to users, and a flurry of companies are out to popularize it. Promising less Net traffic and faster browsing, caching must anticipate usage patterns on a constantly growing and changing Web. But skeptics say ev

All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links.

The year has already proven prolific for new Web caching technologies, designed to store copies of Web pages closer to users and rein in network traffic loads. And depending on who you listen to, the technology is either imperative to the health of the Web - or merely a limited fix for the network's ongoing, unstoppable bulge.

"We're dealing with the beginnings of the performance problem," said Kelly Herrell, vice president of marketing at CacheFlow, maker of a two-month old Web caching device for company networks and Internet service providers. "If we don't install caches, the Web will fail to work. It will get bogged down, and users will not get a response."

Vendors are promoting new and improved approaches to Web caching, most notably in the form of dedicated devices like CacheFlow's, whose hardware, operating system, and software are all built exclusively to cache Web content.

The caching of digital information has already proven successful in the design of computer motherboards, operating systems, and other relatively predictable data paths - stashing frequently used computing instructions closer to processors that need them, for example.

Web caching assumes that the model will translate neatly enough to the Web. It is motivated by the idea that hundreds of thousands of copies of the same pages traverse the network unnecessarily; caching them offloads that traffic by storing pages at ISPs and other localized networks.

But despite Web caching's impressive claims, some say it's not at all clear that it can deliver on them, and that indeed, the odds are stacked against it. "The Web is so broad, it's going to be a network sponge for years and years to come," said Steve Glassman, who has studied caching as a researcher at Digital Systems Research Center.

Glassman sees caches only buying network administrators a little more time before their bandwidth fills up and another high-speed Internet access line must be installed. "Aggressive caching might give you three to six months breathing room that you wouldn't have otherwise."

Nonetheless, the business of Web caching is well underway among the faithful, and Forrester Research predicts it will become a multi-billion dollar market by 2002.

"What's changing is the Internet is becoming more important - it is being used a lot more," said Forrester analyst Brendan Hannigan. "From a manager's perspective, delivering a good response time and a good experience for their users is important - and a cache is one way to do that."

Indeed, his firm's survey of Fortune 1000 companies found that half of participating companies were already deploying Web caches, and Forrester concluded that within two years, nearly all such companies would be doing the same.

This potential market has companies scrambling into the caching business anew, most focusing on the appliance-based approach. These companies include both established equipment vendors like Cisco Systems (with its Cache Engine), as well as new or smaller companies like CacheFlow (the CacheFlow 1000) and Network Appliance (the NetCache Appliance).

Inserted into the network like routers and switches, these newer caching products are in contrast to caching-enabled proxy software from Microsoft, Netscape, and others, which is meant to be installed on standard Web servers.

These new devices are complemented by related caching services that have also emerged. One recent alliance between WavePhore and SkyCache calls for delivering Web pages to caches via satellite.

To those running Internet-connected networks, the technology's gleaming promise lies in reduced bandwidth usage - and therefore lowered bandwidth costs - and a bonus of better browsing performance.

Keeping It Fresh

Central to caching technologies is the issue of freshness - how to keep content sitting on a cache from going stale and thus bring users a delayed version of the Web. Typically, caches have performed updating on a periodical basis, querying the home server of a page or an object to see if an update is necessary.

But with infrequent updates, caching demands a trade-off between stale content and saved bandwidth. In Europe and other countries, for example - where caching set-ups are already commonplace due to pricier bandwidth - caching has typically required a freshness compromise.

Though they spend some bandwidth themselves, Web caching vendors say more frequent and more intelligent updates are the answer. How to go about that is one area in which different technologies compete. Since updating at the moment of a request can slow down the delivery of cached pages, CacheFlow trumpets its "active caching" technology.

Rather than waiting for page requests to check for a Web object's freshness, active caching works to determine which of the many image and text components it holds are most likely to go stale. Algorithms guess which page objects should be "pre-fetched" according to factors like the frequency with which it's been requested already, the frequency with which the object has already changed, and the bandwidth "cost" of retrieving a particular object.

Forrester's Hannigan says the ultimate effectiveness of active caching remains to be seen. "We have to see how it actually works in reality and we just haven't seen that yet."

Elsewhere on the caching front, the Web's bedrock protocol itself, HTTP, is being updated in the upcoming version 1.1 to better relay caching information to networks. New features will let page authors decide which parts of a page should be cached and which shouldn't. Much further down the road, caching advocates anticipate "pushed" cashing - where instead of having to update themselves, caches will receive updates automatically from servers aware of their presence.

Is Bigger Better?

Active or passive, the success of a cache is measured in "hit rates." The higher the percentage of page requests served by a cache - rather than the page's original server - the more successful the cache is. CacheFlow, for example, says it has tested its product and found a hit rate as high as 75 percent.

While CacheFlow's claims remain to be tested by widespread use, others say size is imperative to effective caching and high hit rates. Therefore, their approaches entail the deployment of caches of massive size, placed closer to the "middle" of the Web - as opposed to the periphery where smaller Internet service providers and company Intranets reside.

Mirror Image Internet announced this week its plan for massive, centrally located caches for installation at major Internet access points. Similarly, Inktomi is refocusing its business to sell Traffic Server, software meant to let backbone providers set up large-scale caches in the terabyte range to reduce network load, claiming a 40 to 50 percent reduction in adjacent network traffic.

Caching's Biggest Test

But since caching a system like the Web means having to intelligently identify the most frequently used content on a network notorious for its size, sprawl, and unpredictability, some think caching designs, even the newer ones, may have met their match in the unparalleled properties of the Internet.

"ISPs probably can reduce their network bandwidth to some extent - so it's probably a reasonable business proposition," said critic Steve Glassman. "But it's not going to fix the Net for the rest of us."

The crux of his analysis lies in the ever-decreasing likelihood that as the Web continues to grow, caches can successfully guess at its most requested and most static content.

When Glassman set up a cache at Digital Research, an entire third of the cache's content was hit only once. That is, out of a few hundred thousand page requests made by some 7000 users, two-thirds were repeats - a number he still sees as too small. Even among the re-used two-thirds, he said, many of the pages may need frequent updating.

"Whatever percentage [hit rate] a cache can deliver it is not going to improve as time goes on."

The situation is further exacerbated by the growing dynamic content on the Web - especially pages that are generated exclusively for one particular user. These include the customized home pages offered by sites like Yahoo and Excite.

"We feel that it's important for us as a navigation site to make each and every page become more and more customized for each user," said Graham Spencer, Excite's chief technology officer. "And as we do that, the actual page content is less and less cacheable." Graham does see a place, however, for caching of more bandwidth-hogging media, such as the images that repeat themselves across multiple pages.

Some sites, specifically seeking to keep their content from growing stale in caches, go as far as to mark their pages to indicate that they are uncacheable, using a variety of header tags on pages, such as "pragma: no-cache."

Not entirely pessimistic about caching, Glassman acknowledges that the new technologies may help matters somewhat, but the unwieldy nature of the Web will make it difficult.

"There's a pretty hard upper limit [on caching] based on how big the Web is and how rapidly it changes and how broadly users wander about." That puts a premium on vendors to prove the technology before it can be justified by network equipment buyers. "There must be a very strong reason to do this."

Otherwise, he said, with the Web continuing its explosive growth, caching is pretty much going to remain a niche solution for a niche problem."