Validity and Utility of Amazon Mechanical Turks for Aesthetic Plastic Surgery Research

Helen Xun, MD1, Valeria Bustos, MD1, Darya Fadavi, BS2, Ashley Boustany, MD3 and Justin Sacks, MD, MBA, FACS4, (1)BIDMC, Boston, (2)Johns Hopkins School of Medicine, Baltimore, (3)Harvard Medical School/BIDMC, Boston, MA, (4)Washington University in St. Louis, St. Louis
Goals/Purpose: Amazon Mechanical Turk is a crowdsourcing platform where requesters can pay small amounts for “workers” to complete tasks1. Most recently, it has gained popularity for crowdsourcing surveys, with over 20,000 studies published on Google Scholar, and 1138 studies on PubMed. Its advantages include inexpensive, rapid acquisition of “high quality data”, ease of IRB review process, ability to incorporate multi-media, and demographic targeted distribution2. Due to its popularity, many studies have validated the platform for medical research, including reliable internal consistency, test-retest reliability, with findings reflective of that found in public surveys 3–6. However, nuances in the population must be considered to optimize testing, such as increased likelihood of workers to have differences in the five-factor personality trait domains 7. In recent years, researchers in plastic surgery have used Amazon Mechanical Turk to study difficult questions on public and patient perceptions and experiences8,9. While mTurk has been extensively validated for the study of public and patient perceptions in psychiatry and medicine, little is known of is validity for aesthetic plastic surgery research, how accurately this population on mTurk may reflect the USA public, and standardized methodologies. For example, as 92% of aesthetic plastic surgery patients are female, there is a need to identify if this demographic can be recreated on mTurk. The purpose of this study is to examine the validity and utility of mTurk for aesthetic plastic surgery research, precision of “filters” for specific sample populations, and provide guidelines for future investigations.

Methods/Technique: We distributed a crowdsourcing survey to females in the USA via Amazon mTurks in an IRB-exempt study. We applied the following filters: USA, female. Respondents self-reported demographics, exposure to plastic surgery (procedures undergone, acquaintances with plastic surgery, public perceptions), social media use, and medical history including body dysmorphia. Responses were then compared to historical data as reported by the ASPS 2020 ASPS National Clearinghouse of Plastic Surgery Procedural Statistics using a two tailed t-test, with significance set at a P value of 0.05.

Results/Complications: Of the total of 248 survey participants, 4 self-reported as male (1.6% male, 98.0 % female). A total of 240 surveys were included (mean age 22 ± 12.2 years). While mTurk respondents were more likely to have undergone aesthetic surgery/procedures (19.6% vs 15.0, P<0.05), the frequency of mastopexy, Brazilian butt lift, eyelid surgery, facelift, hair transplantation, and rhinoplasty is not significant. mTurk respondents were statistically more likely to have undergone breast augmentation (15.6% vs 8.3%), lip augmentation (8.9% vs 1.5%), abdominoplasty (11.1% vs 4.2%), and less likely to have undergone cheek implant, chin augmentation, eyelid surgery, facelift (P < 0.05 for all). Of note, mTurks are more likely to have been diagnosed with body dysmorphia (6.7% vs 2.5%, P<0.05). Respondents overall supported plastic surgery for aesthetic purposes (75.0% support), and 45.5% agreed that insurance should cover aesthetic surgery. Majority of respondents agreed that aesthetic surgery positively impacts self-esteem (96%).

Conclusion: Amazon mTurk remains a useful research tool for plastic surgery research, but should be used with caution. While there are similarities between mTurk and plastic surgery patient populations, these nuances must be considered in experimental design. mTurk filters are an appropriate tool to establish a female heavy survey respondent population similar to aesthetic surgery patients, and thus has utility in for hypothesis generation in patient opinions studies. However, respondents are more likely to have been diagnosed with body dysmorphia and higher frequency of plastic surgery, and should be used with caution for patient research. Nevertheless, the high number of respondents who have undergone breast and lip augmentation and abdominoplasty provides this platform as a useful initial hypothesis testing platform. In conclusion, Amazon mTurk is a valuable tool for rapidly generating and testing hypothesis otherwise unachievable by conventional surveying. Further research is necessary to appreciate generalizability of data, and utility in guiding design of clinical research projects.

  1. Turk AM. Amazon Mechanical Turk. Published 2020. Accessed November 22, 2020. https://www.mturk.com/
  2. Buhrmester et al. Amazon’s mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspect Psychol Sci.
  3. Arditte et al. The importance of assessing clinical phenomena in Mechanical Turk research. Psychol Assess.
  4. Chandler et al. Conducting Clinical Research Using Crowdsourced Convenience Samples. Annu Rev Clin Psychol.
  5. Mortensen et al. Comparing Amazon’s Mechanical Turk Platform to Conventional Data Collection Methods in the Health and Medical Research Literature. J Gen Intern Med.
  6. Hauser et al. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behav Res Methods.
  7. Goodman et al. Data Collection in a Flat World: The Strengths and Weaknesses of Mechanical Turk Samples. J Behav Decis Mak.
  8. Fan et al. The Public’s Preferences on Plastic Surgery Social Media Engagement and Professionalism: Demystifying the Impact of Demographics.
  9. Wu et al. What Do Our Patients Truly Want? Conjoint Analysis of an Aesthetic Plastic Surgery Practice Using Internet Crowdsourcing. Aesthetic Surg J.