Testing the ability for critical thought
One of the types of interview I most enjoyed giving at Google was the design interview. This interview is meant to test the ability of the candidate to break down a large problem and tackle part of it. It’s one of the biggest differentiators between hiring someone at the more senior levels (from Staff upwards). Part of the reason for this is that it gives the interviewer to test the candidate for their ability to critically think about something. In my opinion, this is a skill that our education system does not do a good job cultivating. For the interviewer it’s also the opportunity to learn something as it’s impossible to be expert in all the elements which might crop up in the discussion — the candidate is likely to surface things which will improve your understanding as well.
As it’s well known that this is a Google question (so I’m not giving away any secrets), I’m going to use my favourite question as an example for how this may play out. I’ll preface this by saying this is a discussion of the factors around tackling the question rather than necessarily how it is implemented or should be implemented. The question is: design Google Suggest. Google Suggest (also called auto-complete) is the list of search suggestions you get when you are typing a query into Google search (see below)
If the candidate has never come across the functionality before I might demonstrate it or talk them through it. The question is — how would they go about building it? Below are things that are worth discussing as part of this.
Here’s where most software engineers go wrong. You know some algorithms or datastructures and you immediately attempt to map one to the problem. Looking at the example above you might think a trie (or prefix tree) is the way to go here. Now you have your solution you are trapped trying to fit all future requirements into something onto it. At this point expect to be asked something like “can we fit that in memory?” or “how do you generate your trie?”. Untangling the data after you’ve decided on a design is not a good way to go about designing systems at scale.
Instead — the candidates you really want to hire want to explore the problem space some more. This means understanding what’s happening and getting data to understand what might or might not be possible.
First of all — a valid question is: why should we build this? What problem are we trying to solve and how would be verify that it’s solving this problem. This is going to result in more requests coming into Google. Possibly a lot more. So it needs to be worth our time to build it. Building a complex system when you don’t know what problem you’re solving isn’t smart. Senior engineers can do a passable impression of thinking like a product manager. The cheapest time to fail is before we’ve started coding or designing.
Second of all — attaching anything to Google search should be ringing alarm bells. Google serves a lot of search queries. I would expect a sane candidate to ask how many queries at this point as this will materially impact aspects of their design. If asked I would give a figure of 10 billion (it’s not the real number but big enough to break most trivial implementations). Having a data point is a thread to pull at to validate what you’re doing. Criticial thinking is about data.
Next it might be handy to sketch out of the components in play here. This will help you consider the interfaces between components. The good thing is at the beginning there are only 3 you care about. The client interface, the network, and the server at the other end. A good discussion at this point is around the fact we probably don’t want to send one request across the network for every keypress. People type at different speeds (trying to pair program with someone who can’t touch type is agonising — or watching my dad type a single word feels like the moments of my life are evaporating in front of me). The turnaround time for a query is going to be important here too — there’s no point doing auto complete if it takes 10 seconds to get a response back. This gives the candidate the first opportunity of demonstrating that most essential of critical thinking skills — the back of the envelope calculation.
An accomplished touch typist might be able to hit 80 words per minute. Assume 6 characters per word and you’ve got 8 per second (if you ignore punctuation). That means that to keep up with an accomplished typist your round trip time is going to need to be 125ms including time to process the request at the backend. For anyone who has built internet systems this is already feeling unlikely. Compare this with a not accomplished typist doing maybe 30 or 40 keypresses per minute. The functionality might be more useful for them so you would be happy sending one per keypress.
We now have more data and we’ve demonstrated the ability to think with numbers. We can now derive some useful conclusions from this. Namely that it doesn’t make sense to send one request per keypress. Now we can start thinking about batching keypresses within a certain window. Or even waiting for a pause in typing before sending the query for the faster typist. In fact it’s also an opportunity for the candidate to also demonstrate an appreciation of network latency and round trip times for requests as they formulate the next part. The conditions under which the client might send a request to the server. Now if you have someone that is biased towards the front end this conversation can be gloriously detailed and you won’t touch on too much more of the backend other than to sketch out the interface. It’s also a chance to talk about how to run experiments — as the good candidates will recognise that they will need to explore the relationship between keypress and query to get the best user interface. Good engineers know they need to test their assumptions.
The interface between the client and the server is pretty simple. The client sends a partial query and the server sends back a list of possible queries. Nothing to see here. It’s an opportunity for someone to overcomplicate their design with an insane API but not much else.
Now we get onto the backend. The most obvious question you should be asking at this point is — how do we figure out what’s in the results returned back to the client? Our earlier pontification on keypress and round trip time should inform us that we’re going to want to pregenerate these if we’re to stand a chance of serving them in a useful timeframe. Particularly given the size of the possible search space (Google). If we’ve had a discussion about what success looks like this becomes easier (see how the threads all come together?). A simple definition of success would be that more people are able to find something interesting (ideally what they were looking for). You could think of this as saying “if this feature is useful then people will click on the completions and then find the search results useful”. Google has a lot of useful metrics for what makes a useful search result (or an improvement to the search results page) which I won’t go into detail here. But put simply if it’s useful people will navigate from it and not come back for a while to do another search. If the overall results page is better than more of this will happen across all navigation points (remember that organic search isn’t the only game in town here).
So we’re looking for relevance or interesting-ness. Relevance is more useful because it represents shortening the user’s path to what they were trying to achieve. Interestingness is useful from a discover perspective or giving lazy journalists something to write articles for Buzzfeed about.
At this point you’ll notice I’m not even worrying about the algorithm or datastructure I’m going to use (though the fact I’m planning to precompute should give you a clue as to where this is likely to be heading). There’s a standing joke within Google that all problems can be answered by turning them into lookup tables (aka hashtables). There’s a lot of truth in this if you want to operate at planetary scale.
How might we think about relevance? Well, popularity is a good start. What are the popular queries that match the stem (note there are other use cases now where the partial query is not a stem but let’s keep it simple for now). I can just go through the days queries and anything with more than a certain number of occurrences is a candidate. Now we can refer back to one of the earlier numbers I talked about — 10 billion search queries a day. You were probably expecting me to reduce this down to about 100K queries a second (115K) with another back of the envelope calculation. And it may indeed come to this if I start trying to figure out how many machines I need for this. But, right now I still have a data problem which can be summed up as can I run this in memory? (and if so over how many machines will I need to shard it?). You could assume that the worst case is there are 10 billion unique queries that happen every day so the upper bound for your number of query stems are the common stems of 10 billion queries. This is still a LOT.
It’s possible to simplify this, however. A large proportion of requests to Google are things like “facebook” or “twitter”. These are called navigational queries as the user doesn’t understand URLs and simply types the name of the place they want to go and, hey presto, the first result is what they were after. Let’s assume these account for half of Google’s traffic (again made up number). Now we only have to worry about 5 billion unique queries. This is a good reduction without any real work. Most queries will be single or double words. The average working vocabulary of a person is 20,000 words and the average passive vocabulary is 40,000 words. We could guesstimate that this is 20,000 sqared or 400 million pairs of words as a reasonable uppper bound. The reality is that it will be much sparser than than this but even this reduces your upper bound to 400 million. A 96% reduction in the size of your search space and starting to become tractable. The key point here is understand your data. Brute force only works at scale once you’ve made it possible. The paradox is — if you start with brute force then you have to be a lot cleverer in your solution. We want dumb solutions because they scale and can be understood.
There are credible alternatives here (there’s no single true path). For example you might do it at the word level and compute the most likely next word. You might just put all your partial search stems into a hashtable with the answers attached.
Someone observant might also notice at this point that there are many different languages spoken in the world in many different places (not just US English in Pacific Standard Time Zone). We might want to shard our backend based upon country, language or even region. This also is a handy user feature because popularity within your country is likely to yield better results. A very dirty finger in the air estimate would suggest a corpus might get down to 50 million. Of those the number of where there were more than (say) 5 gets even smaller. Maybe even low millions. This becomes a much more tractable problem.
And, we’ve just made a case for multiple backends to request the information from. Once we have more than one we might start thinking there are other cases — for example the user’s search history. It sure would be handy if I could save time of previous or popular queries I’d made. I might want a more contemporaneous store for news items (when Taylor Swift starts trending then more people will be searching for her) with even fewer possibilities. This is now pushing me towards being able to send lots of async requests and make my final response to the client out of the best of those I get back. It’s also making me question how often I have to update my precomputed suggestions.
And, now — we get to design the system. Having first explored the problem space and refined our use cases. Having thought about the user and what might be useful to them we have a set of requirements we can built something for. And, it turns out this is kind of the easy bit. I want fast lookups — so either sharded in memory tables. I want offline processing. I want to be able to fire off RPC requests in parallel. I can make sane(ish) estimates of the numbers of machines I might want to use based on my memory footprint and volume of traffic.
Assuming we’ve got this far in the time (which isn’t all that likely) we are now open to start talking about how we operate the system at scale. What happens when we’re updating a backend? How do we run experiments? How to monitor what we have? What do our fall back cases look like?
The key thing with critical thinking is to explore the problem space first. Think about it with data. Think about why you’re building this. Think about it from the user perspective. Think about the practical implications of what happens when you try to build this system at the likely scale or on top of an existing system. The up front exploration means that you start building something useful that stands a chance of working. If you immediately reach for an algorithm you’re always playing catch up. Software engineering is more than trying to find a suitable pattern — it’s about understanding problems so you can solve them elegantly. It’s a skill that you (hopefully) learn as you go through your career if you’re surrounded by people who’ve been through it before and are willing to invest in asking you the right questions. It’s also what separates the really great engineers from their counterparts.
I’ve written out a thought process in how I give and assess this interview. It’s not definitive for how to solve the problem. In fact re-reading it I can think of lots of different ways I could have gone. The joy of design interviews is there is no single good answer. But, elements of what I’ve discussed above will exist in any good answer. The thing to remember is start by understanding the problem. If you jump straight into solutions you’re going to have a bad time.