What Is Attractiveness? A Crowd-Sourced Evaluation of Facial Beauty

Y. Edward Wen, BA, Joshua Amaya, BS, Zhiguo Shang, PhD, Andrew Jamieson, PhD and Al Aly, MD, UT Southwestern, Dallas
Goals/Purpose: When reading an aesthetic surgery publication or attending a plastic surgery conference, it is clear that some consider outcomes exhibited as superb, while others think they are average or less than ideal. As plastic surgeons, we create changes for patients. Often, change can be mistaken for improvement. We are deemed the experts of attractiveness, and thus it is not acceptable to have this degree of uncertainty on what is ideal. In other words, we should not rely on beauty being only in the eye of the beholder.

The authors hope to help quantify cosmetic results in order to determine if postoperative changes are indeed better or worse than preoperatively. Undoubtedly, objectively measuring aesthetic outcomes rather than subjective evaluation of quality is a monumental task, but it is critical for cosmetic surgery to follow the path towards evidence-based medicine that is increasingly engrained in the medical profession as a whole. Ultimately, it will be beneficial to have a framework for comparing postoperative results to the “ideal” for each anatomic area. Our study begins with the face, intending to create an aesthetic “yardstick” of the face for plastic surgeons and their patients.

Literature shows that humans are koinophiles. Evolutionary psychologists have shown that we seek out mates with minimally unusual appearances. We adore averages and particularly cognitive averages, meaning that we take an average over time and subconsciously generate a composite of the most recent attractive people seen. The authors want to verify and apply these principles in an effort to advance aesthetic plastic surgery.

Our study attempts to identify an “ideal” through crowd-sourced ratings. Furthermore, we measured, calibrated, and averaged specific facial parameters to elucidate what features are integral to attractiveness. We hypothesized that a composite of the cohort images and average periocular features would be the most attractive to raters.

Methods/Technique: Our male and female cohorts each consisted of 41 standardized frontal-view photos from DuBruine1, with 1 image derived from a composite of the other 40. We analyzed photographs with the Mirror database software. We measured 46 facial features per image (Figure 1), calibrating measurements with facial width-to-height ratio and inter-pupillary distance.

The photographs were uploaded to GoogleForms with multiple-choice responses ranging from 1-7, with 1 and 7 being the least and most attractive, respectively. Monochrome photos were used to negate the biases of hair, skin, and eye color. As crowdsourcing has been shown to be a reliable and valid way to evaluate aesthetic outcomes, we utilized Amazon Mechanical Turk, a widely used crowdsourcing platform, to receive ratings of the images.

Statistical Analysis

Analysis of variance (ANOVA) tests were performed for ratings and facial feature measurements. A Spearman correlation test calculated correlation coefficients between facial measurements and attractiveness for both males and females. Further analysis was performed by subgrouping by evaluator gender. The group means were used to compute the absolute percentage difference between the mean value and the given percentage of measurement to its mean. We constructed heatmaps demonstrating the relationship between facial measurements and ratings received. A scale ranging from 0 to 1 was made, with a larger score indicating a higher association with attractiveness.

Results/Complications: The authors obtained 2064 layperson evaluations. The overall male and female mean ratings were 4.09 and 4.06 respectively. The composite male and female scores (means were 5.31 and 5.60, respectively) were significantly higher (both p<0.0001) than the scores of the other 40 photos (means were 4.06 and 4.00, respectively). Furthermore, the composites received the highest rating with male raters, female raters, and overall (Figure 2). Figures 4-5 show the correlation between mean facial measurements. The analysis is subgrouped by all raters, male raters, and female raters. For males, average ear width (r=1, p=0.0004) and right upper brow height (r=-1, p=0.0004) had respectively the highest and lowest correlations with attractiveness based on responses from all raters. For females, average ear height, left medial eye distance, left iris to aperture distance (all r=1, p=0.0004) had the highest associations with attractiveness, with distance of the bottom 3rd of the face having the lowest correlation (r=-1, p=0.0004). The heatmaps in Figures 6-7 show that once again average ear measurements were important for high male and female ratings. Figures 8-9 are heatmaps of the correlation among male and female mean measurements and attractiveness.

Conclusion: Our findings show that average facial anatomy and certain facial features are highly associated with attractiveness. This is an initial step into the field of objective cosmetic quantifications with the purpose of effectively integrating evidence-based medicine into aesthetic surgery. Our next steps are to extend the same methods of measuring beauty to other anatomical regions, demographics, and views. We hope this study inspires others to join us in embracing, expanding, and refining this important process of quantifying outcomes in an effort to revolutionize surgical decision-making.

  1. DeBruine, L.M. (2019). Debruine/experimentum: Beta release 1 (Version v.0.1). Zenodo. http://doi.org/10.5281/zenodo.2632356