Two years ago, I explored the possibility of using Google page counts to measure similarity between concepts. The results are contained in this article. Although the results were not as strong as I had hoped, they may serve as a starting point for further exploration.
June 2, 2007 at 2:24 pm |
[...] attempts to Classify Concepts by search engine page Counts. [...]
June 3, 2007 at 1:18 am |
Interesting work! It reminds me a lot of Latent Semantic Analysis, which similarly calculates similarity between terms by how often they co-occur in documents. The algorithm that LSA uses is a little more complicated, and, I think, might address some of the weaknesses pointed out in the paper. I think it might be neat to try to combine the two approaches, maybe by using google to find a more “interesting” document set to feed to LSA.