The Untold Story of AI Bias Research

Key Takeaways

AI bias research was pioneered largely by Black women operating outside Silicon Valley’s well-funded centers—and they were ignored for years.
Joy Buolamwini’s Gender Shades study proved facial recognition error rates of up to 34.7% for darker-skinned women vs. 0.8% for lighter-skinned men.
Timnit Gebru was fired from Google after raising concerns about large language models—the same technology underlying ChatGPT.
The COMPAS algorithm used in criminal sentencing was nearly twice as likely to falsely flag Black defendants as future criminals.
Black researchers represent only ~4% of AI researchers at major tech companies, yet bear the disproportionate burden of identifying and fixing bias.

The Discovery That Almost Didn’t Happen

In 2015, a software engineer named Jacky Alciné noticed something deeply disturbing. Google Photos had automatically tagged photos of him and his girlfriend—both Black—as “gorillas.” The story went viral. Google apologized. The tech world called it a bug. It wasn’t a bug. It was a feature of how AI is built.

Here’s what the mainstream narrative misses: people knew this was coming and had been raising alarms for years. Their names weren’t Sergey Brin or Sundar Pichai—they were largely Black, largely women, largely operating outside the well-funded centers of Silicon Valley. And for years, they were ignored.

The Researchers Silicon Valley Didn’t Want to Hear

Joy Buolamwini and the Ghost in the Machine

In 2016, MIT Media Lab researcher Joy Buolamwini was working on a project using facial analysis software when she discovered she had a problem. The software couldn’t detect her face. Until she put on a white mask.

Buolamwini’s subsequent research—published as the landmark “Gender Shades” study in 2018—demonstrated that commercial facial recognition systems from IBM, Microsoft, and Face++ had error rates of up to 34.7% for darker-skinned women, compared to as low as 0.8% for lighter-skinned men.

When Buolamwini’s work specifically critiqued Amazon’s Rekognition software, Amazon’s vice president of machine learning publicly attempted to discredit her research methodology—while notably declining to share Amazon’s own internal data. Amazon’s facial recognition software, researchers later found, misidentified 28 members of Congress as people with criminal arrest records. Darker-skinned members of Congress were misidentified at disproportionately higher rates.

Timnit Gebru: The Cost of Speaking Truth

Before she was fired from Google in December 2020—in an incident that shook the AI research community to its core—Gebru was co-lead of Google’s Ethical AI team. She was also a co-founder of Black in AI, an organization that has worked to increase Black representation and inclusion in artificial intelligence research.

Her firing came after she submitted a paper—co-authored with several colleagues—raising concerns about the potential risks of large language models, the same foundational technology underlying tools like ChatGPT. Google management requested she remove her name from the paper. When she refused and raised concerns about the review process, she was abruptly “let go.”

“What happened to me is not new,” she told The New York Times. “It’s the same pattern that happens to Black women everywhere.” Gebru went on to found the Distributed AI Research Institute (DAIR), an independent research organization not beholden to corporate funding.

The COMPAS Controversy: When Bias Has Bars

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is an algorithmic risk assessment tool used in criminal sentencing in many U.S. states. In 2016, ProPublica published an analysis that should have stopped this technology in its tracks: COMPAS was almost twice as likely to falsely flag Black defendants as future criminals, and almost twice as likely to falsely clear white defendants as low risk.

This is a pattern worth naming: when AI bias research threatens a revenue stream, the response is often to bury the finding under technical complexity, dispute methodology, and wait for the news cycle to move on. The people whose lives are shaped by these systems don’t have the luxury of waiting.

The Structural Problem: Who Builds AI, Who Gets Hurt

A 2023 survey by Stanford’s Human-Centered AI institute found that Black researchers represent approximately 4% of AI researchers at major tech companies, despite making up 13% of the U.S. population. Women of color are even more dramatically underrepresented.

Consider training data: the datasets used to train AI systems are built from existing digital content—and existing digital content reflects existing inequalities. When AI systems are trained on historical hiring data, they learn historical hiring discrimination. When trained on health data, they learn the systemic underdiagnosis and undertreatment of pain in Black patients. When trained on crime data, they learn the systemic over-policing of Black communities.

The algorithm is a mirror, and the mirror shows a society that has never achieved the equity it claims to pursue.

The Invisible Tax on Black Voices

The work of identifying, documenting, and challenging AI bias falls disproportionately on Black researchers, Black advocates, and Black communities. Joy Buolamwini had to write the paper. Timnit Gebru had to build the organization. The communities experiencing bias from COMPAS had to fight legal battles most of them couldn’t afford.

This is an emotional labor tax, a cognitive labor tax, and in Gebru’s case—a career tax. The people with the most to lose from AI bias are also the people being asked to do the most work to fix it, with the least institutional support and the highest personal risk.

What The Research Has Proven

Documented AI Bias Findings

Facial recognition: Performs significantly worse on darker-skinned faces with error rates exceeding 30% for darker-skinned women vs. under 1% for lighter-skinned men.
Healthcare algorithms: A 2019 study in Science found a widely-used health algorithm assigned similar risk scores to Black patients who were actually sicker than white patients.
Criminal justice: Algorithms show documented racial disparities in prediction and risk scoring that reinforce over-policing and over-incarceration.
Hiring algorithms: Show bias against applicants with “Black-sounding” names—a digital replication of discrimination documented in resume audits for decades.
Language models: Reflect and amplify racial biases, associating negative language with Black-identified names and positive language with white-identified names.

The Researchers Who Changed the Conversation

It is worth pausing to name names—because naming is itself a form of resistance against erasure.

Pioneers of AI Bias Research

Joy Buolamwini

Founder of the Algorithmic Justice League, whose “Gender Shades” research reshaped how the industry talks about facial recognition bias.

Timnit Gebru

Co-founder of Black in AI, founder of DAIR, whose research on bias in language models and whose firing from Google became a flashpoint in AI ethics debates.

Safiya Umoja Noble

Author of Algorithms of Oppression, who documented how search engine results perpetuate racist and sexist stereotypes.

Ruha Benjamin

Author of Race After Technology, who coined the term “the New Jim Code” to describe how neutral-seeming technical systems can perpetuate racial hierarchy.

Rediet Abebe

Co-founder of Black in AI, whose research on algorithmic fairness has influenced how the field understands structural inequality.

Deborah Raji

Researcher who has worked extensively on AI auditing and accountability, including audits documenting bias in commercial facial recognition systems.

What Liberation Looks Like in the Age of AI

Imperatives for Change

Diversity is not decoration: Diverse teams catch biases that homogeneous teams miss. This is a quality assurance imperative backed by research.
Community engagement is not optional: The people most affected by algorithmic systems have knowledge that researchers and engineers do not.
Accountability requires teeth: Industry self-regulation has consistently failed. Meaningful change requires independent auditing, legal liability, and regulatory enforcement.
Naming harm is not neutral: When researchers name AI bias, they challenge systems that benefit powerful institutions. This research is, by its nature, political.

The researchers who built the field of AI bias research—who documented the harm, named the patterns, paid the professional and personal costs—did so in the tradition of every activist and scholar who refused to let power go uncontested. Their work is not done. Neither is ours.