Saturday, April 08, 2006

[t] look inside Google accelerator

I download Google accelerator(http://webaccelerator.google.com/) . It is not a mainstream google product at this moment, but attracted my curiosity about the web-page prefectching strategies. This article is mainly based on my usage experience, as well as from previous reading about GFS and disk-access performance optimization papers from Google.

The accelerator uses a mixed locality measure to determine the distance of each link on a page to the current page. One of them seems to be page rank, and the other is the traditional spatial locality, in the context of page layout. When the accelerator agent (in the form of a browser plug-in) detects a page-load, it sort all the links on that page, based on their page rank. Those pages that have higher page rank tends to be prefetched at a higher priority. The usage of spatial locality is that when the reader's concerntration seems to be on a certain block of the current page, all links around the current link he chooses tends be prefetched.

It is interesting that the accelerator tries to adapt to the user's reading pattern. I noticed that when I am browsing a news website, of which news article links are laid out row-wise. The accelerator tried to prefetch pages according to my reading gap, i.e. the vertical spaces between each news page I read. This simple heuristics may works very well under some occasions.

There are of course lots of potentials to improve the 'smartness' of the agent. Such as using a Marcov chain for page browse history, or disover frequent sequences. But the problem is bounded by the fact that webpages are highly different from each other in the intra-page link structure, and that's my guess why the accelerator is not useful enough to be a mainstream google product.