Paul Christiano is a 4th-year mathematics student at MIT with an emphasis on theoretical computer science and is a past summer engineering intern at Quixey. Paul was awarded “Best Research Paper” in the 2011 ACM Symposium on Theory of Computing (STOC).
In order to provide relevant search results, Quixey needs to integrate many different sources of evidence — not only each app’s name and description, but content from all over the web that refers to specific apps. Aggregating huge quantities of information into a single judgment is a notoriously difficult problem, and modern machine learning offers many approaches.
When you need to incorporate just a few pieces of information, there’s a mathematical version of “brute force” that works quite well, based on Bayes’ Rule:

If you’re not familiar with mathematical formalism and probability theory, this might look a little opaque. Let’s go through a simple example to show intuitively how this sort of mathematical reasoning might work.
Suppose I have no prior knowledge about an app other than a blog that described this app as “challenging.” What is the probability that the app is a reasonable result for the query “games”?
Say I know that about 10% of apps are reasonable results for the query “games.” I also know that a blog describes an app as “challenging” with probability 50% if the app is a game. Further, I know that if an app isn’t a game, then a blog would only describe it as “challenging” with about 5% probability.
So let’s do the math… I know that 10% of apps are games, and 50% of those are described as “challenging”. This means that 5% of all apps are “games which are described as challenging.”
I know that 90% of apps are not games, and 5% of those are described as challenging. Therefore, 4.5% of all apps are “non-game apps which are described as challenging.”
In total, 9.5% of all apps are described as “challenging”, while 5% of all apps are “games which are described as challenging,” so 5% / 9.5% = 53% of all apps described as challenging are games. Knowing nothing else, when I see an app described as “challenging,” I conclude it has a 53% chance of being a game.
Processing a Mountain of Evidence
Fortunately, Quixey doesn’t have to guess whether an app is a good result for a query after only looking at one piece of evidence. We look at a huge volume of evidence before making our decision. But having that much data introduces new problems.
Perhaps the most pressing question is: How do we get all the knowledge necessary to do our analysis? It may be the case that 50% of all games are described as challenging by blogs, but how do we know that unless we already know which apps are games?
The solution is for human programmers to write a description, in general terms, of what sort of evidence is relevant and how. Then an inference algorithm is responsible for filling in all the details. This is done by applying Bayes’ Rule to the structure of the human-defined model, instead of simply to the apps themselves.
Here’s an example of this approach. Suppose we expect that one of two different models describes our data. One model might specify that 50% of all games are described as challenging, while a different model might specify that only 5% of games are described as challenging. Initially, we have some beliefs about which model is accurate — say, we think the two models are equally likely. As we observe data, we can improve our beliefs. If one model consistently describes the data better than the other, then its likelihood increases. This is similar to our first example, in which our “this app is a game” hypothesis became more likely once we saw that the app was described as “challenging.”
Specifying two models may not be any easier than specifying one. However, specifying a large class of models that contains a good one may be quite easy. For example, consider the infinite list of models of the form: “X% of all games are described as challenging.” This class does contain a good model, even though it took almost no thought to write down. If an algorithm is able to automatically learn which of these models work well, then it may be able to use evidence effectively without humans injecting any knowledge at all, except our very general understanding that the logical properties of an app affect how blogs describe it. (more…)