In 2024, a group of dermatology researchers led by Dr. Jenna Lester (UCSF) and Dr. Roxana Daneshjou (Stanford) published a striking finding in the Journal of Investigative Dermatology: major vision-language models—the same type of AI systems that generate and interpret images—show clear bias toward light skin.
What the researchers did
To quantify bias, the team prompted each model with phrases like “person with acne” or “person with eczema,” repeated five times per condition. In total, 45 images per model were evaluated. Three medical students independently scored each image’s skin tone on a standard dermatology scale, later reviewed by a dermatologist. The results were statistically decisive: • Light-skinned outputs dominated across nearly all conditions (melasma, psoriasis, vitiligo, acne, eczema, melanoma, and more). • Significant disparities (p < 0.0001) showed that skin tone representation was not random. • The pattern held across all four platforms—implying a systemic training data bias, not a single-model flaw. Only herpes and syphilis prompts showed no major tone differences, likely due to non-cutaneous image styles the models generated.
What this means for AI in skin analysis
The paper doesn’t attribute intent to models—it shows that training data defines visibility. Because most open-web and medical imagery skews toward lighter skin, modern vision-language systems “see” and recreate lighter complexions more easily. That leads to: • Uneven performance: AI models may describe or render conditions inaccurately on darker tones. • Poor educational coverage: Synthetic teaching datasets might misrepresent how conditions look on diverse skin tones. • Feedback loops of invisibility: Each biased output reinforces the bias in downstream models trained on those outputs. The authors conclude that these systems are not yet suitable for clinical or educational use until their data and evaluation frameworks are rebalanced.
How this informs Mudface’s approach
Mudface isn’t a diagnostic or educational medical tool—it’s an AI platform for aesthetic and skincare understanding. But the principles are the same: objectivity, accuracy, and applicability in how AI interprets human skin. 1) Diversity as a core design goal: This research highlights that representation can’t be an afterthought. At Mudface, we actively collect and balance training images across the full spectrum of skin tones, lighting conditions, and camera types. 2) Evaluation built on balance — Just as the study measured tone distribution statistically, we use quantitative fairness checks to track model behavior across tone ranges. 3) Transparency over mystery — The paper noted that most commercial VLMs disclose nothing about their training sources. Mudface takes the opposite stance: clarity around dataset structure and labeling practices is built into our development process. 4) Shaping the architecture of accurate AI — The study underscores that aesthetic AI is extraordinarily specialized and requires specific expertise and thought. We design our architecture with these principles in mind so the model is trained to “see” skin, not just pixels.
The broader vision
The lesson from this 2024 study is simple but urgent: AI reflects its training world. If an AI model is not shaped specifically for skin and aesthetics and is carefully crafted to produce the most accurate results for all skin types, tones, and concerns, the technology will be inaccurate and unhelpful. The next generation of beauty and skin AI must be built differently—scientifically rigorous, demographically inclusive, and transparent by design. That’s the path Mudface follows: inspired by academic rigor, guided by scientific metrics, and driven by a belief that the future of AI for skin and aesthetics must be specifically tailored to the task at hand to ensure competency.
AI reflects its training world. Mudface is building scientifically rigorous and highly accurate aesthetic AI for all skin tones, types, and concerns.