birthday problem

Meeting someone with the same birthday as you always seems like a happy coincidence. After all, with 365 (366 including February 29th) unique birthdays, the chances of any two people being born on the same day appear to be small. While this is indeed true for any two individuals picked at random, what happens when we add a third, a fourth, or a fifth person into the fray? At what point does it become inevitable that some pair of people will share a birthday?

Of course, the only way to guarantee a shared birthday among a group of people is to have at least 367 people in a room. Take a moment to think about that statement, and if you’re still stumped, read this. Now that we know 100% probability is reached with 367 people, how many people would it take to reach 50%, 75%, or 90% probability? If you think the answer is 184, 275, and 330, then you would be quite wrong. Here’s why:

Let’s assume that all birthdays are equally likely to occur in a given population and that leap years are ignored. To paint a more vivid picture in our minds, let’s further assume that we have a large room and that people are entering the room one at a time while announcing their birthdays for everyone to hear. The first person enters the room and announces that his/her birthday is January 1st (we can choose any date we want without loss of generality). The second person has a $364/365$ probability of having a different birthday from the first person and therefore a $1 - 364/365$ probability of having the same birthday. The third person has a $(364/365) \times (363/365)$ probability of having a different birthday from either of the first two people and therefore a $1 - (364/365) \times (363/365)$ probability of having the same birthday as either of first two people. The fourth person has a $(364/365) \times (363/365) \times (362/365)$ probability of having a different birthday from any of the first three people and therefore a $1 - (364/365) \times (363/365) \times (362/365)$ probability of having the same birthday as any of first three people. To generalize, the probability of the $n$ th person being the first person to have the same birthday as any of the $n-1$ people before him/her is:

P(n) = 1- \frac{364}{365} \times \frac{363}{365} \times \frac{362}{365} \times \cdots \times \frac{365-n+1}{365}

Note that the yellow series in the above graph grows exponentially rather than linearly, with the probability reaching 50% at just 23 people. 75% and 90% probability are reached at 32 and 41 people, respectively. By the time 70 people are in the room, there is a greater than 99.9% chance that two individuals will have the same birthday!

As the number of people increases, $P(n)$ switches from exponential to logarithmic, with each additional personal providing less incremental probability than the previous. Interestingly, the 20th person provides the greatest incremental probability, as seen in the above table.

In contrast, the probability that any one person has a specific birthday is denoted by the much simpler equation:

P_1(n) = 1 - \left( \frac{364}{365} \right)^n

This relationship, which is highlighted by the green series in the graph, grows at a much slower rate than the yellow series. In comparison, it takes 253 people for $P_1(n)$ to exceed 50%.

Testing Our Assumptions

One key assumption we made in the above exercise was that all birthdays (aside from February 29th) occur with equal probability. But how correct is that assumption? Luckily, Roy Murphy has run the analysis based on birthdays retrieved from over 480,000 life insurance applications. I won’t repeat verbatim the contents of his short and excellent article, but I did re-create some charts showing the expected and actual distribution of birthdays. The bottom line is that the actual data show more variation (including very apparent seasonal variation by month) than what is expected through chance.

Implications on Birthday Matching

Now that we know that birthdays in reality are unevenly distributed, it follows that matches should occur more frequently than we expect. To test this hypothesis, I ran two Monte Carlo Simulations with 1,000 trials each to test the minimum number of people required to get to a matching birthday: the first based on expected probabilities (each birthday with equal likelihood of $1/365.25$ and February 29th with likelihood of $1/1461$ ) and the second based on actual probabilities (sourced from the Murphy data set).

Note that the distributions of both simulations are skewed positively (i.e. to the right). The results appear to corroborate our hypothesis, as evidenced by the gray line growing at a faster rate than the yellow line in the above graph. Indeed, the average number of people required for a birthday match is 24.83 under the simulation using actual probabilities, slightly lower than the 25.06 using expected probabilities. However, the difference is not very significant; therefore, our assumption of uniformly distributed birthdays works just fine.

As always, you can find my work here.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Jeffrey Fan

Random Musings of an Amateur Data Scientist

Fun with Excel #18 – The Birthday Problem