I further notice that for an "accelerated" page, browsing it in an google-accelerator-enabled browser will always trigger a subsequent visit to that page from google. What is hard to judge is that whether the google visit occurs before or after the page loading in browser. Let me hypothize what's happening under both possibilities:
1) the google visit occurs before the page loading: it could be that the accelerator sends the url to google, then some backend software agent retrieves the page from the page to be visited, and compute the difference between the retrieved copy and the copy cached by google. By using a digest algorithm, this comparison should be rather fast.
Then, if the software agent determines that the page has not been updated since last time it was cached, it returns some flag to the accelerator agent embedded in user's browser, so the accelerator agent could safely use a local cached copy.
Recall that the assumption of google accelerator is that user is using a internet connection much slower than google's.
2) the google visit occurs after the page loading: this is much safer compared to the previous one, because the user is always getting the up-to-date copy of the webpage. After the user gets the page, the accelerator notifies google to re-retrive the page to make sure it's keeping the most recent copy. The accelerator could then record how many times the user visited the page and how many times it was later found that the page was not updated. If the (times_not_updated/times_visit) ratio is rather high, it could assume that the page is static and supply it directly to the user when the page is requested again.
of course there could be a hybrid solution between the above two. Actually a hybrid approach is most probable, I think.
Wednesday, April 26, 2006
Saturday, April 08, 2006
[t] look inside Google accelerator
I download Google accelerator(http://webaccelerator.google.com/) . It is not a mainstream google product at this moment, but attracted my curiosity about the web-page prefectching strategies. This article is mainly based on my usage experience, as well as from previous reading about GFS and disk-access performance optimization papers from Google.
The accelerator uses a mixed locality measure to determine the distance of each link on a page to the current page. One of them seems to be page rank, and the other is the traditional spatial locality, in the context of page layout. When the accelerator agent (in the form of a browser plug-in) detects a page-load, it sort all the links on that page, based on their page rank. Those pages that have higher page rank tends to be prefetched at a higher priority. The usage of spatial locality is that when the reader's concerntration seems to be on a certain block of the current page, all links around the current link he chooses tends be prefetched.
It is interesting that the accelerator tries to adapt to the user's reading pattern. I noticed that when I am browsing a news website, of which news article links are laid out row-wise. The accelerator tried to prefetch pages according to my reading gap, i.e. the vertical spaces between each news page I read. This simple heuristics may works very well under some occasions.
There are of course lots of potentials to improve the 'smartness' of the agent. Such as using a Marcov chain for page browse history, or disover frequent sequences. But the problem is bounded by the fact that webpages are highly different from each other in the intra-page link structure, and that's my guess why the accelerator is not useful enough to be a mainstream google product.
The accelerator uses a mixed locality measure to determine the distance of each link on a page to the current page. One of them seems to be page rank, and the other is the traditional spatial locality, in the context of page layout. When the accelerator agent (in the form of a browser plug-in) detects a page-load, it sort all the links on that page, based on their page rank. Those pages that have higher page rank tends to be prefetched at a higher priority. The usage of spatial locality is that when the reader's concerntration seems to be on a certain block of the current page, all links around the current link he chooses tends be prefetched.
It is interesting that the accelerator tries to adapt to the user's reading pattern. I noticed that when I am browsing a news website, of which news article links are laid out row-wise. The accelerator tried to prefetch pages according to my reading gap, i.e. the vertical spaces between each news page I read. This simple heuristics may works very well under some occasions.
There are of course lots of potentials to improve the 'smartness' of the agent. Such as using a Marcov chain for page browse history, or disover frequent sequences. But the problem is bounded by the fact that webpages are highly different from each other in the intra-page link structure, and that's my guess why the accelerator is not useful enough to be a mainstream google product.
Subscribe to:
Posts (Atom)