Celebrating Black Pioneers and Diversity in Data Science
As we commemorate Black History Month, it‘s a time to honor the groundbreaking contributions of black mathematicians, statisticians, and data scientists throughout history—and to confront the glaring lack of diversity that continues to plague the field today.
Data science is one of the most rapidly growing and impactful fields, with transformative applications across business, healthcare, technology, science, and government. However, like the tech industry as a whole, it suffers from a severe lack of racial and gender diversity. African-Americans make up just 3% of data science professionals in the U.S., far below their 13% share of the population. Women are also drastically underrepresented, holding only 15-25% of data science roles globally.
This lack of diversity is not only an issue of equity and inclusion, but a critical problem that prevents data science from fulfilling its potential as a transformative force for good. Homogenous teams are more likely to produce biased algorithms and AI systems that perpetuate or amplify societal inequities. Diverse perspectives are essential for developing data-driven solutions that benefit people of all backgrounds.
As we work to build a more diverse and inclusive future for data science, it‘s worth reflecting on the black pioneers who helped lay the intellectual foundation for the field, often in the face of intense barriers and discrimination. Their stories serve as an inspiration and reminder of the importance of expanding opportunities in data science to all of humanity.
The African Roots of Data Science
The history of data science has deep roots in the African continent, dating back over 1000 years. While their contributions are often overlooked, medieval African and Arabic mathematicians helped establish some of the core statistical and algorithmic concepts that form the basis of modern data science.
One prominent example was the Persian scholar Al-Khwarizmi (c. 780-850 AD), who wrote influential works on algebra and algorithms while working in Baghdad. His name is the origin of the word "algorithm."
But even earlier, African mathematicians were making groundbreaking advances in quantitative and statistical analysis:
-
The Egyptian scholar Ibn al-Haytham (965-1040 AD) wrote influential works on optics, physics, and scientific methods emphasizing empirical data and reproducibility. He‘s considered by some as the first data scientist.
-
Majid Bahar el-Din (1178-1260 AD) was an Ethiopian astronomer and mathematician who developed an early version of the law of large numbers, a foundational concept in statistics and probability.
-
The Moroccan sociologist Ibn Khaldun (1332-1406) pioneered the application of quantitative and demographic analysis to the study of history and the rise and fall of civilizations. He developed an early version of social network analysis.
These medieval African and Middle Eastern thinkers helped establish a tradition of using math to make sense of data and quantitative phenomena. Centuries later, black American scholars would take up the same intellectual torch in the face of intense adversity and oppression.
Early African-American Pioneers in Statistics
In the late 19th and early 20th centuries, a number of African-American mathematicians and social scientists made seminal contributions to statistics and quantitative research:
-
Kelly Miller (1863-1939) was one of the first African-Americans to earn a PhD in mathematics. As a professor at Howard University, he published pioneering statistical studies on the socioeconomic conditions of black Americans.
-
W.E.B DuBois (1868-1963), the famed sociologist, historian, and civil rights leader, was also a data visualization pioneer. For his groundbreaking 1900 study The Georgia Negro, DuBois hand-drew almost 60 innovative and striking graphs, charts, and maps to illustrate the data—a remarkable feat in an age before computers. He continued using creative visualizations throughout his career to shed light on the black experience.
-
Dudley Weldon Woodard (1881-1965) earned a math PhD from the University of Pennsylvania and chaired the mathematics department at Howard University for decades. He did influential work in statistics and inspired generations of black mathematicians.
-
Euphemia Haynes (1890-1980) was the first African-American woman to earn a PhD in mathematics. During her 47-year career teaching math and statistics at Howard University and DC public schools, she fought tirelessly against racial segregation.
These black American pioneers managed to advance the fields of statistics and quantitative research despite intense barriers and discrimination. Their work laid the foundation for the emergence of the field we now know as data science in the mid-20th century.
The Underrepresentation of African-Americans in Data Science Today
Unfortunately, the legacy of these black trailblazers is not reflected in the demographics of the data science field today. A 2021 survey found that only 3% of data science professionals identify as black or African-American:
This disparity stems from a variety of factors, including unequal access to STEM education, lack of exposure to data science career paths, bias in hiring and promotions, and unwelcoming workplace cultures, among others. The problem is even worse at the leadership level, with African-Americans holding less than 1% of top executive roles in the tech industry.
The consequences of this lack of diversity are profound. Homogeneous teams are more likely to produce data science applications that exhibit racial bias and disproportionately harm communities of color:
- Facial recognition systems misidentify darker-skinned individuals at high rates
- Hiring algorithms have been found to prefer white-sounding names
- Racist bias has been discovered in healthcare algorithms that determine access to care
- Data-driven risk assessment tools exhibit bias against black defendants in the criminal justice system
Having more black data scientists in the room can help spot and mitigate these kind of discriminatory algorithms before they are put into practice. Diversity makes data science teams smarter, more innovative, and more in tune with the needs of the diverse communities they serve.
Black Data Science Leaders Making an Impact
Despite the barriers they face, many brilliant black data scientists are doing incredible and important work today. Here are a few inspiring examples:
-
Dr. Timnit Gebru is a trailblazing advocate for ethical AI and diversity in tech. The co-founder of Black in AI, her research focuses on uncovering racial bias in machine learning and its impacts. She made headlines when she was controversially fired from Google in 2020 after raising issues of bias and discrimination on their AI ethics team.
-
Dr. Moustapha Cisse leads Google‘s AI research center in Accra, Ghana, the first of its kind in Africa. Born in Senegal, his work focuses on making AI more inclusive and beneficial to Africa and the developing world. He also co-founded Black in AI.
-
Dr. Rediet Abebe is an Ethiopian-American computer scientist working at the intersection of AI, algorithms, and inequality. Her research uses data science to improve access to opportunity and social mobility for disadvantaged communities worldwide. She co-founded Mechanism Design for Social Good to promote research at the interface of AI and social justice.
-
Brandeis Marshall is the founder of DataedX, a company that provides data science training programs to increase the representation of black women and girls in the field. She advocates for leveraging data for social justice and eliminating bias from data and algorithms.
These inspiring data scientists offer a glimpse of what a more diverse generation of leaders can bring to the field. But far too few black data professionals have access to opportunities to advance to these levels. We need more initiatives and investment to make data science accessible and equitable for all.
Taking Action to Advance Diversity in Data Science
Successfully increasing diversity in data science will require coordinated effort across the tech ecosystem. Some promising initiatives and levers for change include:
-
Early STEM education: Programs like Black Girls Code and Data Science for Everyone aim to inspire black children‘s interest in data science as early as elementary school.
-
Targeted scholarships and research funding: Organizations like the National Society of Black Engineers offer scholarships, research grants, and support to help black students access graduate-level training in data science and AI.
-
Advocacy for equitable workplace practices: Initiatives like Data 4 Black Lives promote corporate commitments and policies to reduce bias in data practices and promote black representation in data science workplaces.
-
Stronger DEI policies and pipelines: Tech employers must enact comprehensive programs to recruit, hire, and support more black talent in data science roles. Executive compensation should be tied to meeting diversity goals.
-
Elevating diverse voices: We need to amplify black data scientists‘ stories, expertise, and accomplishments. Conferences like BlackInDataWeek showcase their incredible work and inspire the next generation.
-
Sustained funding: All of these initiatives require dramatically scaled-up investment from tech companies, foundations, and government to expand access and opportunities for black talent in data science.
Progress is possible with focus and determination. The latest analysis shows black representation in the tech workforce has grown from 5.7% in 2010 to 7.4% in 2022. But much more work remains to achieve equitable inclusion of black data science professionals, especially in leadership, and to root out discriminatory data practices.
A More Diverse and Equitable Future for Data Science
Data science is one of the most powerful forces shaping the trajectory of humanity in the 21st century. It holds immense potential to expand access to healthcare, education, economic opportunity, justice, and social mobility worldwide. But that potential can only be realized if data science reflects and benefits all of humanity—not just the most privileged slices.
A more diverse and inclusive data science ecosystem is essential for the field‘s technical and ethical future. As we‘ve seen, homogeneous data teams are prone to replicating and amplifying societal biases in algorithmic form. Lack of public trust from marginalized communities weakens the social impact of data-driven solutions. And data science is simply smarter and more creative when it incorporates a wider range of perspectives, experiences, and ways of solving problems.
So as we celebrate the pioneering contributions of black data scientists for Black History Month, we must also confront and dismantle the deep inequities that persist in the field today. We should ask ourselves what we can do to help—whether mentoring black data scientists, advocating for change in our organizations, working to eliminate bias in our data and algorithms, or funding diversity initiatives. Every action can help build a data science community that lives up to the proud legacy of the black trailblazers who helped establish it.
Ultimately, diversity is not a side issue but a core imperative for data science to fulfill its potential as a transformative force for human progress. Only by expanding access and elevating talent from all corners of humanity can data science tackle the immense global challenges we face—from healthcare to poverty to climate change. The sooner the data science ecosystem commits to diversity as a necessity and not just an ideal, the sooner we can start creating a better and more equitable future empowered by data. The work starts now.