The reasons your search results are not the same as mine
I was having a conversation with my father in law recently where he expressed frustration that the results he gets from Google for a particular search term are no the same as when someone else does. This led to an interesting discussion about why determinism is not a goal (or even desirable) in a search engine.
It’s worth starting with a quick recap of how Google works. Let’s start with how Google determines the most relevant search results for a given query. Starting with the most basic assumption.
Relevance is a factor of how many people feel something is relevant. If you have a web page that is linked to by a hundred other people then this might indicate it is interesting. Let’s say you work in an office with 5 other people — Bob, Sally, Joe, Judy and Simon. You want to know where to go for lunch. Bob, Sally and Joe say go around the corner to Macdonalds’s. Judy and Simon say go to Fernandon’s Brasserie. Which is the right answer?
The simplistic version of this is you go with the option which has the most people advocating it. In its simplest form this is how PageRank works. But, what if Judy has a wildly successful food blog? In this case one might expect to weight the opinion of an expert more highly. And, in fact, this is what Google does. Links are weighted based on the perceived value of a particular source. This is one of the protections against Google-bombing as it makes it harder to manipulate search rankings by dint of number of links. The need to process this information at scale has been the motivation for Google to develop the astonishing infrastructure it has over the last 20 years.
Now what if you really love junk food? Your taste is not the same as Judy’s. For you the most relevant answer might actually be Macdonalds. Now we start to see why different search results might be desirable. There is no single perfect answer and our individual idea of relevant is subjective. In fact there is a hierarchy of relevance to consider here:
- Default — if you know nothing about a person what would be the most relevant thing. This would imply some degree of determinism — not perfect and we’ll get to that later.
- Implied — if you know some attributes about a person what would be the most relevant thing. If you know the country they are in and what language they speak this can give a useful degree of separation.
- Personalised — you know the person and their tastes and can tailor your results well to them.
This leads to some side effects around relevance where the internet turns into an echo chamber for what you already believe. This can be exploited by people wishing to manipulate public opinion (worth another article). It means that to be as useful as possible your search results should be personal to you.
Another thing worth considering is the volume of data human beings are generating on the internet. Information is changing all the time. In the olden days of Google a new search index might be deployed every month and this was a big event carrying a lot of risk. It would need to be deployed to many datacenters. So the act of index change would naturally change your results and different datacenters might respond differently. This isn’t a problem because it’s rare that successive requests would be served by different datacenters. Nowawadays there is data that can be updated within minutes if not seconds. This means the corpus you’re searching against is constantly changing. Add into this that the search infrastruture (and possibly algorithm) for Google is updated weekly and that there are likely to be thousands of experiments running at any point in time. This allows Google to operate a relentless pace of improvement that’s hard for anyone else to replicate.
You can see from this that determinism is not only not a goal but would actually hinder the development and utility of a search engine. Yes it might be nice to simply share a search term but it’s easy enough to share links and it’s better to have something that’s increasingly useful to you — the end user.
Note: I’ve deliberately given a simplified account of how this works because I don’t want to inform people looking to game the system. Google being useful benefits everyone.