The LGBTQIA+ Community: It’s Underrepresentation in Data Science & Statistics and How to Overcome the Problem

We live in a world driven by data. Technology makes it possible to collect more data than ever. The field of data science exploded within the last decade as industries turned to professionals to collect, sort and interpret the raw data now available to them. 

That growth is even expected to continue through the next decade — the U.S. Bureau of Labor Statistics projects that the field of data science will grow by 22% through 2030, which is much higher than the average growth of 8% in other fields. 

The insights gained from analyzing data are used across a variety of industries and have the power to shape algorithms, policy, and public opinion. We have data science to thank every time we enjoy a perfectly curated social media feed, Google image search “cute puppies” to brighten our day, or when we are particularly motivated by an infographic to fight for a just cause.

However, the relationship between data science and marginalized communities — particularly the LGBTQ+ community—has not always been smooth sailing. So, where has the field gone wrong before?

And more importantly, how can it become better?

If you are interested in becoming a data science professional, The Data Incubator offers a data science bootcamp that teaches students the crucial skills they need to succeed. 

What is Data Science Used For?

First things first: what exactly is data science and how is it used?

To put it simply, data science is a field of study that uses mathematical, scientific, and programming expertise to analyze and interpret large amounts of data. Data is being collected every minute of every day, but it would be nothing but a bunch of numbers without data science professionals to interpret it. Those who work in the data science field are tasked with sorting through those numbers and statistics and gathering useful information from the raw data. 

There are several key professions within the field of data science: 

  • Data engineers create and maintain the infrastructure used to store and move data.
  • Data scientists interpret data to draw conclusions.
  • Data analysts analyze and organize conclusions about a data set and then communicate it to those outside the field.

Now you might be asking yourself: that sounds great and all, but what is this analyzed data actually used for?

The answer is, well, everything. If we had to list out all the different applications of these data analyses this would probably qualify as a novella instead of an article. 

Data science is behind every algorithm, whether it’s the one behind your favorite social media site, the one recommending you the perfect new Netflix show to binge, or even the one helping law enforcement officers combat online child trafficking. 

Companies also make use of data science to help them better understand their consumers, predict market trends and better streamline their internal management. 

The Dark Side of Data Science

First, let’s talk about the past pitfalls in the industry. 

Data is everywhere, and it can be an extraordinarily effective tool for gathering and communicating information—but data is only as unbiased as the people who interpret it. It’s tempting to look at numbers and believe that because they are data-driven they are infallible, but data science relies on human beings to organize and interpret that data. And as thousands of years of history have taught us, “infallible” and “human” don’t exactly belong in the same sentence.

There’s been a lot of conversation in recent years about how artificial intelligence can learn to discriminate. An algorithm is only as reliable as the data it’s trained on. That data can be incomplete and flawed, and the algorithm itself can even contain the biases of the professionals who created it. 

Amazon came under hot water when it was revealed that an artificial intelligence system their team developed to help process job applications had actually learned how to discriminate against women—machine-learned misogyny.

The data that the AI used to sort incoming applications was based on the resumes of the people hired at the company within the last decade. It used the data from those past resumes to look for similarities in incoming ones. Sounds like a smart move, right? Not exactly — those past resumes belonged overwhelmingly to men. So the new AI learned to overlook resumes from women in favor of resumes from men: a non-human system repeating a very human mistake.

The system was ultimately never used (thankfully), but Amazon is far from the only entity to land in hot water due to skewed applications of data science. Algorithms used by search engines and even law enforcement agencies contain racial bias, and those used to determine credit limits can give equally qualified women much lower credit limits than their male counterparts.

Algorithms and other artificial intelligence systems that are trained on historical data are most likely going to reflect the inequality present in that data, and if engineers, scientists, and analysts don’t realize that the datasets they have are biased, suddenly you’ve got discriminatory AI on your hands.

But how does this apply to the LGBTQIA+ Community? 

Gay marriage has been legal for less than a decade. Avril Lavigne’s hit 2002 song “Sk8er Boi” is older than the federal right for lgbtq+people to have intimate relationships with each other without being criminally charged. Anti-discrimination victories for the LGBTQ+ community are so new that it’s almost a given that AI and algorithms are still learning from data that is either discriminatory towards or completely excludes the LGBTQ+ community. 

Take the TSA for example. Getting through airport security is a dreaded chore for most people, but a routine body scan before a flight can turn into a humiliating ordeal for transgender travelers.

When performing body scans, TSA agents are trained to select a button that corresponds with the perceived gender of the person being scanned: pink for woman and blue for man. This tells the system whether to use a male or female body as a baseline to detect any anomalies on the scan of the traveler.

However, if a person’s gender presentation doesn’t match their genitalia, anomalies appear on the scan and further screening is required. Trans people have spoken out about being stopped, patted down and taken to private rooms for additional and invasive screening. Travel is already stressful, but because of data systems built on the outdated concept of binary gender, it can become traumatizing to transgender or other gender non-conforming travelers.

Facial Recognition Technology (FRT) and Automatic Gender Recognition (AGR) technology are two controversial systems that use data to assign and predict gender (and sometimes even sexuality). But the idea that a person’s gender can be determined from their physical appearance alone is not only outdated, but it’s also flat out inaccurate. Not to mention the use of these technologies can have troubling ramifications—like when the “girls only” social media app Giggle used selfies to discriminate against transgender women accessing the app.

Even data-based systems that try to help members of the LGBTQ+ can backfire. Natural Language Processing (NLP) trains  artificial intelligence to screen for and prevent hate speech on social media. Despite its admirable goal, artificial intelligence systems using NLP can have the opposite effect and ban members of the LGBTQ+ community from using their own identities, flagging “gay” and “lesbian” as offensive words.

LGBTQIA+ Underrepresentation in Data Science 

So why do so many systems exclude the LGBTQ+ community, even the systems that try to help it?

The answer is underrepresentation — both in the field and in the data itself.

There aren’t a lot of datasets specifically studying queer people, and the collection of data also leaves out people who don’t fit into a binary. Prompts that ask users to specify their gender rely on the idea of the gender binary — the infamous “male,” “female” and “other” categories. 

Grouping all trans and gender non-conforming people into an “other” category means that the data about those people is not as specific as it needs to be helpful. “Other” isn’t a gender.

The person writing those questions or the data professionals who are collecting and sorting the data might not mean any harm at all, but without adequate LGBTQ+ representation in the field itself, these things fall through the cracks. 

 While there isn’t an industry-wide dataset about LGBTQ+ representation in the field of data science (ironically enough), the tech industry has not escaped issues surrounding workplace transphobia and homophobia. 

Many problems surrounding bad data-driven systems could be recognized and solved sooner if the engineers, scientists and analysts in the data science field better represented the diversity of the people whose data they worked with day in and day out.

The Bright Side of Data Science

Okay, so that was the uncomfortable news. But that isn’t to say that the field of data science is rife with bias and has no positive effects on society — the opposite is true!

While data about LGBTQ+ folks is limited, organizations can still compile and interpret data to raise awareness about the discrimination that members of the community face. The University of Alberta’s Institute for Sexual Minority Studies and Services has a real-time counter that tracks the use of homophobic slurs on Twitter, giving people a direct visualization of the prevalence of homophobia on social media, even today. 

The LGBTQ+ organization Gayta Science has multiple projects using data to visualize the diversity of the community and the problems that they face — including one dedicated to trans people who face violence and one that tracks the pay gap between members of the LGBTQ+ community and cisgender, heterosexual employees.

Data visualizations like those can have an immense impact on raising awareness and understanding about issues certain communities face, and well-founded AI systems are used to promote equality. Benefits Data Trust is an organization that uses algorithms to help families in need apply for resources and assistance, and an estimated 90% of non-profit organizations are turning to data science to collect and analyze information. 

Data is crucial to understanding different demographics and educating the general public about how marginalized communities are impacted by the world around them. Good data that is well-analyzed and well-communicated can make an extraordinary difference in garnering public support, advocating for policy change, or even just helping businesses better serve the full variety of their diverse clientele.

Here at the Data Incubator, we value inclusive and diverse workplaces where we seek a broad range of perspectives and contributions to our industry. We’ve created two scholarships to help aid in the creation of diversely talented students that will fully represent our industry and through them, we will award up to $200,000 annually. We believe that everyone has a right to education regardless of social barriers and are committed to combatting the lack of diversity in STEM. Check out more about our Diversity, Equity & Inclusion Scholarship and our Women of Excellence in STEM scholarship and apply today!

To call back to the very beginning of this post, data science is a field growing at a skyrocketing rate. We already live in a world shaped by data—now imagine the possibilities of a field that is full of diverse engineers, analysts, and scientists who can understand and advocate for the LGBTQ+ community.

What Are You Waiting For?

There has never been a better time to become a data scientist, especially if you are a member or ally of the LGBTQ+ community. Data science skills are an invaluable asset. They equip data scientists with the tools they need to provide accurate, insightful, and actionable data — tools that are even more important when helping marginalized communities. The Data Incubator offers an immersive data science boot camp where students learn from industry-leading experts to learn the skills they need to excel in the world of data.

We also partner with leading organizations to place our highly trained graduates. Our hiring partners recognize the quality of our expert training and make us their go-to resource for providing quality, capable candidates throughout the industry.

Take a look at the programs we offer to help you achieve your dreams.

Here’s what a couple of our past students, Shiva and Marcos have to say about our program. 

We’re always here to guide you through your data journey! Contact our admissions team if you have any questions about the application process.

Related Blog Posts

Moving From Mechanical Engineering to Data Science

Moving From Mechanical Engineering to Data Science

Mechanical engineering and data science may appear vastly different on the surface. Mechanical engineers create physical machines, while data scientists deal with abstract concepts like algorithms and machine learning. Nonetheless, transitioning from mechanical engineering to data science is a feasible path, as explained in this blog.

Read More »
Data Engineering Project

What Does a Data Engineering Project Look Like?

It’s time to talk about the different data engineering projects you might work on as you enter the exciting world of data. You can add these projects to your portfolio and show the best ones to future employers. Remember, the world’s most successful engineers all started where you are now.

Read More »
open ai

AI Prompt Examples for Data Scientists to Use in 2023

Artificial intelligence (AI) isn’t going to steal your data scientist job! Instead, AI tools like ChatGPT can automate some of the more mundane tasks in your future career, saving you time and energy. To make life easier, here are some data science prompts to get you started.

Read More »