TOM W Probability Puzzle..

Siddhant
9 min readJun 14, 2021

The article gives the mathematical explanation of the probability puzzle/question that is present in the Chapter 14 (TOM W’s SPECIALTY) of the book “Thinking Fast and Slow” by Dr. Daniel Kahneman.

The above description should in no way discourage others who are unaware of the book as the content below is for anyone and everyone who is up for a quick exercise of their brain muscles. The only prerequisite for the readers is to have some basic knowledge of Probability (Standard 9 level) and some level of curiosity to learn about judgement and decision making ability of the human mind. That being said, the content below will require continuous analytical effort from the readers. So, clean up your mind from all the clutter and welcome aboard.

The problem that we will be solving below is a classic application of the Bayes Theorem that is widely used in probability and statistics theory by researchers and students. However, its usefulness in the day to day life of individuals in decision making is so immense that limiting its application only to researchers or students may seem like an important life tool not being shared with the masses. To make things easy I won’t go forward and give a formal definition of the Bayes theorem, rather I will just define it by saying, it’s a probability rule that requires unbiased logic and rational usage of all the information that is presented/known to us. Remember “All the information”.

Here is the problem. Tom W is a graduate student in the United States (US) who is randomly picked from the pool of graduate students in the US, we are asked to compute the probability that Tom W is a graduate student in the Computer Science (CS) Department? It is also known that 3% of the entire graduate student population in the US is enrolled in CS. On top of that, it is also given that based on nature (like education, social skills etc.) , someone like Tom W is 4 times more likely to be a CS graduate student than a graduate student of any other department.

Before we go forward and calculate the actual value, I would like all my readers to take a quick guess about the approximate value of the probability value they are expecting. Do not go forward and do some mathematical operations , just take a quick guess based on your intuition and keep the value in your head.

Probability of event to happen P(E) = Number of favorable outcomes/Total Number of outcomes

I hope everyone recalls the above equation, if not then try to recall the sweet memories of the Mathematics classes from your school days. Okay, if you are back from your school then we can go ahead and solve the above problem. To add some encouragement, we will use no other formula except the one mentioned above.

What would have been the probability if no information about Tom’s nature was given in the problem?

Let us assume that there are a total N number of graduate students in the US. So, we can easily conclude that the total number of students enrolled in the CS department is 3N/100 and all the other departments are 97N/100. You can apply the above formula and use the values to get the probability as 3%. The probability calculated is same as the fraction of students enrolled in the CS department. If looked through another angle, probability value gives the fractional presence of the favored outcome in the total outcome space.

Now, assume you have participated in a lottery draw in which you have been given a random card from a deck. The winner of the lottery is the one whose card matches with the card that the host of the lottery show draws out. In an ideal scenario all the cards would have the same probability of winning the lottery, 1/52 (1.9%) of being withdrawn and that must equal your chances of winning the lottery as well. But our host is an entertaining guy and tries to air in some curiosity before he actually tells everyone which card he has withdrawn. He gives the hint and says that the card in his hand is a face card. What is the new probability of you winning the lottery? To revise your probability of winning the lottery, the first thing that you do is revise the total number of possible outcomes to 16 (as there are only 16 face cards in a deck of cards) and accordingly your new probability of winning the lottery will depend on whether you have a face card or not in your hand. The point that I am trying to make here is that the total number of outcomes changed once we found out that the withdrawn card is a face card. This idea forms the basis of Bayes Theorem, updating the total and favorable outcome with the new available information.

Now, we shall apply the same logic in our problem as well. Our initial sample space consists of the entire population of Graduate students of the US. But with the added information regarding the nature of Tom, we will have to revise it and take only those students who have a nature similar to Tom W as our new sample space.(Consider the information as extra information that is provided by the sampler before he actually goes forward to reveal the identity of Tom W). So, let’s go ahead and come up with our new sample space that only consists of students with nature similar to Tom.

It is given to us that given his nature Tom W is 4 times more likely to be a CS student. We can write the above statement in a mathematical terms in the following way:

P(Tom W being CS student | Nature) = 4 x P(Tom W being Non-CS student | Nature)

P(Tom W being CS student | Nature) + P(Tom W being Non-CS student | Nature) = 1 ……………………….(Universal law of Probability)

We can solve the two equations and get the values as follow:

P(Tom W being CS student | Nature) = 4/5 = Fraction of students in CS department with Nature of Tom W

P(Tom W being Non-CS student | Nature) = 1/5 = Fraction of students in Non-CS department with Nature of Tom W

What do the two values tell? It tells us that out of the total CS graduate student in the US, it is highly likely that (4/5)th of them (80%) have nature similar to Tom W and of the other departments (1/5)th of them (20%) have a nature similar to that of Tom W.

Total number of students in CS department with Nature of Tom W = (Total fraction of students in CS department having nature similar to Tom W) x (Total number of students in CS department)

Total number of students in Non-CS department with Nature of Tom W = (Total fraction of students in Non-CS department having nature similar to Tom W) x (Total number of students in Non-CS department)

Adding the above two equations will give us the total sample of the US graduate students who have nature similar to Tom W.

Total number of graduate students in the US with a nature similar to Tom W (New sample space) = Total number of students in CS department with Nature of Tom W + Total number of students in Non-CS department with Nature of Tom W.

In our case where we have considered the total population of the US graduate student as N. Total number of graduate students in the US with a nature similar to Tom W will be equal to 4/5 x (3N/100) + 1/5 x (97N/100). Our sample space is ready.

Now comes the task of coming up with the number of favorable outcomes. Let’s go back to our card example. If we had a face card in our hand then the number of favorable outcomes would have been 1, while if we had a non-face card with us then it could have never allowed us to win the lottery so the favored outcome would have simply been 0. Number of favorable outcomes is always a subset of Total number of outcomes. Pause and just try to understand the previous line, if possible try to visualize it through a Venn diagram it will make your life much simpler.

To arrive at the number of CS students having a nature similar to Tom W we can directly use the value computed above. Or else, we can also remove all the outcomes corresponding to Non-CS graduate student from the sample space to arrive at the total favored outcome: 4/5 x (3N/100). That brings us to our final step of calculating the value of the probability whether Tom W is a CS student or not. In mathematical terms, we can summaries through the following equation.

P(Tom W being a CS student | Nature) = (Number of CS students having nature similar to Tom W) / (Total number of graduate student having a nature similar to Tom W irrespective of their department)

Plugging the values calculated earlier in the above equation gives the value as 11%. Based on the nature of Tom W and the demographics of US graduate students it is 11% likely that he is pursuing his graduate studies in CS.

In case you are wondering what this guy is trying to achieve by consistently throwing Math's formulas on us and making us think over them again and again. On the contrary he started off with a promise that he will be sharing a life tool which will benefit in decision making. Are we fooled?

Before you go ahead and claim yourself to be fooled, I would like you to go back in time and recollect the number that you had first thought of as the answer to the problem based on your intuition. Does it stand close to 11%? If so, then great you are sorted and if not hold on for a little more time. The character Tom W has almost all the characteristics (80% to be precise) that is required to be a CS student, however there is 89% chances that he might end up being a Non-CS student. To give some real life touch, think of a child in your house or maybe neighborhood about whom you would have heard that he is very smart and a consistent topper in the school. One fine day, you get to know that he did not get admission to a top rated college. It will itch you for a while and you might think that how is it even possible, he was the one of the smartest guys whom you had known.

Was he not smart enough?

Of Course he was. The reason is you were not smart enough when it came to understanding the situation. In your mind he was the smartest kid and all you knew was it was him who would have gotten the admission. But what you failed to see was that the acceptance ratio for the top rated college was somewhere around .01%, and you definitely overlooked it All this while all the estimates regarding his/her admission in your mind was solely based on his level of smartness and that had left an infinitesimally small amount for failure. The reason for such a deviation is simple, ignorance of base rate in any given scenario.

The base rate in our problem was 3% that is the percentage of Graduate students enrolled in the CS Department, however our mind was so busy in judging Tom’s persona that we believed that the demographic characteristics had hardly anything to do with the department in which Tom W might be pursuing his graduate studies.

However, one distinct thing that I have realized is that as Indians we tend to completely/significantly ignore the base rate, more often in cases related to exam results, success in business or any other similar situations where we have seen our acquaintances succeeding. On the other hand, when it comes to extreme cases like pursuing a career in sports, music etc. we tend to be hyper biased towards the base rate. Though the base rates are well known in both of the cases but we completely ignore it in one of them while overweight it in another.

I won’t go any further but just end up with a shock revelation that the ignorance or over possessiveness towards the base rate might be delusioning us to such an extent that future Lata Mangeshkar of our country might be sitting at a desk and selling premium subscriptions for your musical fantasy. Think!!

--

--