Predicting Flight Delays with Random Forests: Alumni Spotlight on Stacy Karthas

Stacy was a Fellow in our Winter 2017 cohort who landed a job with one of our hiring partners, AdTheorent

Tell us about your background. How did it set you up to be a great Data Scientist?

I received my Bachelor of Science degrees in mathematics and physics from the University of New Hampshire. I then went on to graduate school at Stony Brook University. I graduated with my master’s degree in Physics in December 2016. During my master’s degree, I did research in Nuclear Heavy Ion Physics with a focus on the analysis of gluons and their products as they traversed our detector. The data analysis, simulation, and clustering algorithms I worked on prepared me to become a data scientist because it was a physical application of many of the tools used by data scientists.

What do you think you got out of The Data Incubator?

The Data Incubator gave me the chance to solidify my data science knowledge. It helped me pull together tools and concepts I had been using during all of my previous research experiences. I learned a lot of new machine learning concepts and how they could be applied to real world data.

What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?

Python is key. Learning as much as you can before the program is very important. I would also suggest taking an online course or reading a bit about machine learning before the program starts. Also, it is easier if you try to relate the concepts back to something you’ve already done. It was easier for me to visualize how clustering algorithms worked because I had been working on my own for a few months.

What is your favorite thing you learned at The Data Incubator?

My favorite thing I learned at The Data Incubator was how to create models with scikit-learn. Because of my limited background in Python, the fact that you can use such a convenient package to do some very solid machine learning was very neat!

Could you tell us about your Data Incubator Capstone project?

My capstone project was an app that predicted whether or not a domestic flight in the US would be delayed. This was based on date, time of day, airline, airport, etc.

How did you come up with the idea for the project?

Millions of passengers take domestic flights every day, whether for business or for pleasure. The worst thing about flying is that you have to build in time in case you have a delay, and at least 15 % of flights are delayed by more than 10 minutes and many flights are delayed hours. I thought that I could create an app that would allow people(and myself) to find an airline or flight that is not likely to be delayed so as to minimize the chance of this hassle.

What technologies did you use and what skills did you learn at TDI that you applied to the project?

I used scikit learn’s random forest classifier to build my prediction model along with other packages to assist in evaluating and cross-validating my results. I also used flask and heroku to deploy my app. Some of my visualizations used matplotlib, seaborn, plotly and d3.

What was your most surprising or interesting finding?

I thought it was interesting just how poorly some of the airlines performed. Generally, the larger airlines tended to have worse on-time statistics and small airlines like Alaskan and Hawaiian had short delays in general.

Describe the business application for this project (how could a company use your work or your data)

Time is money. I can think of two ways a business would want to use this. The first is that they don’t want to send their employees on business trips to have them waiting around in the airport so it would be best to book with airlines that have fewer delays. Additionally, this app would promote competition and accountability among airlines. They would be able to promote themselves with their on-time statistics in addition to customers holding airlines to higher standards.

Do you have an interesting visualization to share?


And lastly, tell us about your new job!

I am currently working as an associate data scientist at Adtheorent. Adtheorent is a digital media company that bids on mobile and web ad space for their clients. My job is to build models that help increase the likelihood that an advertisement will perform well (be clicked, be seen, or someone will buy the product).

Learn more about our offerings:

Related Blog Posts

data science portfolio

How to Build a Strong Data Science Portfolio: 5-Step Guide

So you want to be a data scientist? Great choice! Data scientists are still the hottest jobs around. But before you can start applying for data science jobs, you need to build a strong data science portfolio. A data science portfolio is a collection of your best data science projects that demonstrate your skills and abilities.

In this blog post, I’ll provide a 5-step guide on how to build a strong data science portfolio.

Read More »
imposter syndrome

Impostor Syndrome in Tech: What It Is, Why It Exists, and How to Overcome It

Impostor syndrome isn’t experienced in just certain industries or disciplines or only by certain individuals. It’s much more widespread than you may think. If you’re in the technology field, you may be familiar with this sentiment, but maybe you’ve never heard the term impostor syndrome. So, what exactly is impostor syndrome? What causes it? And how do people in data science, the tech field or STEM industries overcome it?

Read More »