Jan 6, 2009

Specify an algorithm how to find all GoogleWhacks in the Google Index. A Googlewhack is a Google search query consisting of two words, that returns a

Specify an algorithm how to find all GoogleWhacks in the Google Index.
A Googlewhack is a Google search query consisting of two words, that returns a single result

In university I learned that a search engine index is a list of words and their corresponding in specific web-documents (docIds):
the: 42,532, 2342 <--word "the" appears in doc #42,#532,#2342
car: 23, 345,35345
For each word
For the other word
Get two words inverted index
Use merge to find common doc id
O(n^3)

No comments:

Post a Comment