July 18, 2013 – Jeffrey Fan

This post is dedicated to my Dad, whose lifelong passion for learning has been an inspiration for my own never-ending pursuit of excellence and the truth. Happy birthday Pops!

—

Disclaimer: This post is the first of (hopefully) many in a new series called “Fun with Excel,” where I use Microsoft Excel to model out and explore interesting real world topics.

—

This week, I explore the topic of physical attraction from a statistical perspective.

Okay, I admit I was very tempted to insert a provocative picture of <Name of Hot Actress/Model>, but that would have been trying too hard. So what is physical attraction and how does it work? Again, I want to reiterate the fact that I am talking about physical attraction (aka Hot or Not), none of this lovey-dovey emotional stuff. So I don’t want to read a comment later that says, “But Jeff, you didn’t incorporate personality into model!” No, I didn’t, and that was on purpose.

Background: For starters, an assumption: attractiveness is (mostly) objective. Sure, we’ve all heard the phase “beauty is in the eye of the beholder,” and this saying certainly holds merit. Ask a group of men to rank a group of women by attractiveness (or vice versa), and it is highly unlikely that you will get two identical rankings. However, the correlation between the rankings should be statistically significant. Height, skin complexion, body proportion are just three of many physical traits that play a role in defining a person’s “objective” attractiveness. The ancient Greeks figured this out millenniums ago, but in case you’re not convinced, here’s a short expert from Malcolm Gladwell’s Blink. So if attractiveness is indeed objective, it seems reasonable that we can also assume that is is normally distributed. Data collected from the popular dating site OkCupid seems to suggest that this could be the case:

Ignoring the message distribution lines for a moment, we notice that while males rate females on a normal distribution, women seem to rate males on a log-normal distribution. Ouch. So does this mean that most men are just ugly? Not quite. Remember, both males and females are ranking the opposite sex based on their perceptions of attractiveness. But if we know that attractiveness is objective, what might cause the discrepancy between the perceived log-normal distribution and the actual normal distribution? One likely explanation is superiority bias, which is psychology speak for narcissism. Superiority bias states that humans tend to overestimate their positive qualities and underestimate their negative ones. If that sounds familiar, it’s because it is. Superiority bias is documented in almost everything we do, from our perception of our own intelligence to our driving skills (oh God). However, the superiority bias is nothing more than an illusion. 80% of people might rate themselves above average on driving skills, but this is a logical fallacy. By definition, 50% of the population must be above average drivers. The same principle should hold true when it comes to beauty: half of the population is above the average attractiveness level, while half is below. Clear? Ok, let’s move on to the model.

The Model: Throughout the model, I thought about attractiveness on a percentile basis rather than on a raw scale (1-10). Although these methodologies should theoretically yield the same results, it is often more natural for people to think on a linear scale. However, this tendency actually has the impact of embedding our biases into our ratings, causing us to be less objective for the reasons stated above. I found that thinking about things on a percentile basis forces us to consider the situation from a more objective perspective. Rather than ask “Is this person a 9.5 (out of 10)?” which leads us to question what a “9.5” constitutes in the first place, we can ask “Is this person 3 standard deviations above the mean?” The latter has an inherent meaning, namely, if you put this person in a room of 1,000 people is he/she the most attractive person in the room? In addition to assuming a normal distribution for the attractiveness of both men and women, I gave each group two additional characteristics: superiority bias (%) and seek range (%).

I incorporated superiority bias by applying it as a scaling factor for how people perceived their own attractiveness. For example, if you have a true attractiveness-percentile (a-perc) of 50% (i.e. you’re average) and a superiority bias of 0%, then you would perceive yourself as also having an a-perc of 50%. However, if you had a superiority bias of 20%, then you would perceive yourself as having an a-perc of 70%.

Seek range refers to how wide a person looks when looking for a potential partner. There are a few important things to be noted about how the seek range is actually incorporated into the model. First, the seek range is based on one’s perceived a-perc and not their true a-perc. Think about it. If we believe we are more attractive than we actually are, then it makes sense that we would attempt to seek out other people whom we believe to be around the same attractiveness. So returning to our example, if you have a true a-perc of 50% and a superiority bias of 20%, you would perceive your a-perc to be 70%. If you also had a seek range of 20%, you would look for potential partners with a true a-perc between 60% and 80% (I assume for simplicity that people will seek both upwards and downwards equally, except in boundary cases). The rationale for using true a-perc here rather than perceived a-perc is the observation that other people tend to perceive us more objectively than we do ourselves. In other words, superiority bias is something that affects your own perception and not the judgment of others.

By the default, the model assumes that the “seeker” is a male who is looking for a “target” female (it is quite easy to change this if desired). Furthermore the model has the option to customize the superiority bias and seek range of the male and female populations independently.

The Goal: Given a set of assumptions for the 4 input variables (2 biases and 2 seek ranges), the model includes a macro that iterates the seeker’s true a-perc from ~0% to ~100%, returning the compatibility range, which is the range of targets that is also interested in seeker. Remember that while attraction can be one or two-sided, we are only interested in how changing the input variables will impact the area of mutual attraction. Building on our example from earlier, recall that the seeker is a male with a true a-perc of 50% and a superiority bias of 20% (and therefore a perceived a-perc of 70%). With a seek range of 20%, he is looking for females with a true a-perc between 60% and 80%. Conversely, assuming that females also have a bias of 20% and range of 20%, we can back-solve to figure out that the set of females interested in the seeker have a self-perceived a-perc between 40% and 60% (don’t continue reading until this makes sense to you). This in turn corresponds to the set of females with a true a-perc between 28.6% and 42.9% (by reversing the superiority bias). However, recall that the seeker is only interested in females with a true a-perc between 60% and 80%. So it is obvious that in this case that the compatibility range is 0%, and the seeker goes home unhappy to eat his bowl of ramen noodles and cry himself to sleep.

The Results: I first explored the impact of the magnitude of the superiority bias by keeping the bias assumptions symmetrical. Here are the results (click on the charts to see the original image size):

The base case where the superiority biases = 0% paints an interesting picture of what happens at the two extremes. Due to the way that seek range is incorporated in the model, once a-perc reaches either the low-end or high-end, the seek range becomes asymmetrical since the range itself remains the same at all points. I won’t delve too deeply into the mathematical analysis of why the lines look exactly the way they do, but intuitively these results should make sense. People with very low a-percs have a smaller compatibility range since fewer people are interested in them, while the middle of the pack flattens out as expected. People at (and slightly above) the 80% level receive a wide range of interest from the opposite sex, but their compatibility range is still capped at their own seek range of 20%. Lastly, people with very high a-percs also experience a smaller compatibility range, due to the fact that there are simply fewer people pursuing them (more on this later).

Things get interesting as we increase the superiority biases. The middle of the curve becomes more V-like as the bias increases, until the whole curve becomes very bimodal at the 15% and 20% bias levels. In other words, significant superiority bias has a very disproportional negative impact on those of average attractiveness. Due to both their own bias and a symmetrical bias in their targets, these Average Joes will aim for women who won’t be interested in them. Similarly, the range of women who are interested in the Average Joe are below his seek range. In the scenario where both males and females hold a superiority bias of 20%, half of the men (with a-percs between 25% and 75%) end up with a compatibility range of 0%. Now, before you raise your hand and point out that 20% is a very high value to assign a superiority bias, ask yourself this: given a roomful of 100 of your peers, would you rank yourself in the top 30 in terms of attractiveness? If this doesn’t seem entirely ridiculous, then your superiority bias may be larger than you thought. Given the somewhat bleak picture painted by Chart 1, should we all just give up on love and dating if our chances of being attracted to someone who also happens to be attracted to us is so low?

Luckily, no. For one, the assumption that superiority bias is symmetric might not be correct. Remember the two OkCupid charts above, which seem to suggest that males perceive female attractiveness normally while females perceive male attractiveness log-normally? Well, one way to actually incorporate this discrepancy into our model is by making the superiority biases asymmetric. Thus, if we accept the findings of the OkCupid study to be valid for the general population, then we should give women a larger superiority bias than men.

In Chart 2, I’ve kept the female superiority bias constant at 20% for all the plots, while changing the male bias from 20% to 0%. Note that this has the impact of skewing the V-shape part of the plot to the right, while the boundary cases remain unchanged. From these plots, we see quite clearly that even if we make a conscious effort to reduce our superiority bias or even remove it entirely, it doesn’t get us too far if the other side doesn’t reciprocate. So now what?

There are still two things we haven’t considered. As Chart 3 demonstrates, increasing the seek range can in fact compensate for a high superiority bias in the opposite sex (compare the red plot to the orange one). However, note that the vast majority of the benefits resulting from increasing the seek range end up going to the high-end (those with high a-scores), with less impact on the lower end of the scale and almost no impact on the middle section. Finally, we must remember that people’s preferences (and even those of the entire population) can change over time. As we grow older, we gain a better understanding of ourselves as well as a better sense of what kind of partner we’re looking for. This may cause us to lower our own superiority bias while either increasing or decreasing our seek range. For example, the black plot in Chart 3 is my best guess at what the dating scene might look like around the age of 28-35, where most people (and perhaps women more so than men) are looking to get married. This plot looks more along the lines of the black “zero-bias” plot in Chart 1, and features a less serious V-shape, which means there is hope yet for our Average Joe 🙂

So What?

Statistics are interesting and modeling is fun (right Troy?). Our model seems to do a relatively decent job of demonstrating how attraction works, but it’s all somewhat meaningless if we can’t draw some larger conclusions to the real world. To that end, I offer the following points:

Be more flexible. Remember that the upper limit of your success will always be your seek range. It doesn’t matter if you are in the 90th or 10th percentile of attractiveness. If you only search within a 5% range for potential partners, you’re compatibility range is at most 5% as well! Although this point may seem obvious, it reinforces the idea that people of average attractiveness (particularly within 1 SD of the mean) should broaden their seek range in order to increase their chances of success. (Note that increasing your compatibility range is not the same as increasing your utility…but this topic is best left for another day).
It doesn’t hurt to aim high. Take advantage of the fact that people with very high a-percs actually have a smaller compatibility range. This is because there simply aren’t enough people who are able to pursue the high a-percs, since you yourself would need to have a very high a-perc in order for the very top echelons to fall within your seek range. You can differentiate from the crowd by either lowering your superiority bias or increasing your seek range (or both), and doing so will increase your probability of success.
Patience can pay off. After all, attraction involves two parties, and as our model has shown, you can only do so much on your side of the equation to impact the overall reality. However, people are neither homogeneous nor static in their preferences, both of which are fundamental assumptions in the model. So even if things aren’t working out at the moment, don’t give up, because they eventually will in the end.

—

Phew! What a journey. When the idea for the first topic in the Fun with Excel series popped into my head, I didn’t expect to end up this deep in the weeds. The Excel model was a product of a couple days’ thought process, and several hours of actually building out the model and stress testing it. My first version included a particularly nasty macro to continuously automate Excel’s goal seek function, but luckily I was able to figure out a way to reverse the implementation of the superiority bias using mathematics, which sped up the process of actually generating results. If you’re interested, you can take a look at the model here. You are free to play around with it as you like, but if you plan to use it or modify it for academic, commercial, or any other purpose that involves publication, I ask you to please provide the proper attribution.

I thoroughly enjoyed working on this project, and welcome all your questions and comments below. If you have any suggestions of future topics I could pursue in the Fun with Excel series, please let me know!

-J

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Jeffrey Fan

Random Musings of an Amateur Data Scientist

Day: July 18, 2013

Fun with Excel #1 – The Laws of Attraction