Predicting Flight Delays with Random Forests: Alumni Spotlight on Stacy Karthas

Stacy was a Fellow in our Winter 2017 cohort who landed a job with one of our hiring partners, AdTheorent

Tell us about your background. How did it set you up to be a great Data Scientist?

I received my Bachelor of Science degrees in mathematics and physics from the University of New Hampshire. I then went on to graduate school at Stony Brook University. I graduated with my master’s degree in Physics in December 2016. During my master’s degree, I did research in Nuclear Heavy Ion Physics with a focus on the analysis of gluons and their products as they traversed our detector. The data analysis, simulation, and clustering algorithms I worked on prepared me to become a data scientist because it was a physical application of many of the tools used by data scientists.

What do you think you got out of The Data Incubator?

The Data Incubator gave me the chance to solidify my data science knowledge. It helped me pull together tools and concepts I had been using during all of my previous research experiences. I learned a lot of new machine learning concepts and how they could be applied to real world data.

What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?

Python is key. Learning as much as you can before the program is very important. I would also suggest taking an online course or reading a bit about machine learning before the program starts. Also, it is easier if you try to relate the concepts back to something you’ve already done. It was easier for me to visualize how clustering algorithms worked because I had been working on my own for a few months.

What is your favorite thing you learned at The Data Incubator?

My favorite thing I learned at The Data Incubator was how to create models with scikit-learn. Because of my limited background in Python, the fact that you can use such a convenient package to do some very solid machine learning was very neat!

Could you tell us about your Data Incubator Capstone project?

My capstone project was an app that predicted whether or not a domestic flight in the US would be delayed. This was based on date, time of day, airline, airport, etc.

How did you come up with the idea for the project?

Millions of passengers take domestic flights every day, whether for business or for pleasure. The worst thing about flying is that you have to build in time in case you have a delay, and at least 15 % of flights are delayed by more than 10 minutes and many flights are delayed hours. I thought that I could create an app that would allow people(and myself) to find an airline or flight that is not likely to be delayed so as to minimize the chance of this hassle.

What technologies did you use and what skills did you learn at TDI that you applied to the project?

I used scikit learn’s random forest classifier to build my prediction model along with other packages to assist in evaluating and cross-validating my results. I also used flask and heroku to deploy my app. Some of my visualizations used matplotlib, seaborn, plotly and d3.

What was your most surprising or interesting finding?

I thought it was interesting just how poorly some of the airlines performed. Generally, the larger airlines tended to have worse on-time statistics and small airlines like Alaskan and Hawaiian had short delays in general.

Describe the business application for this project (how could a company use your work or your data)

Time is money. I can think of two ways a business would want to use this. The first is that they don’t want to send their employees on business trips to have them waiting around in the airport so it would be best to book with airlines that have fewer delays. Additionally, this app would promote competition and accountability among airlines. They would be able to promote themselves with their on-time statistics in addition to customers holding airlines to higher standards.

Do you have an interesting visualization to share?

Cause_of_Delay

And lastly, tell us about your new job!

I am currently working as an associate data scientist at Adtheorent. Adtheorent is a digital media company that bids on mobile and web ad space for their clients. My job is to build models that help increase the likelihood that an advertisement will perform well (be clicked, be seen, or someone will buy the product).
 

Learn more about our offerings:

Related Blog Posts

incubator

Career Enablement at TDI

TDI is more than your typical bootcamp. We provide robust career support to ensure exceptional outcomes for our students. Learn more about our career enablement options here!

Read More »
is it too late to become a data scientist

Is It Too Late for Me to Become a Data Scientist?

You might think data science is a young person’s game. After all, this is a relatively new discipline that might not have been around when you were in school. But research shows nearly half of all data scientists are 40 years and older. Whatever your age, it’s never too late to pursue your dreams of becoming a qualified data scientist. Learn how to succeed in this profession in this blog.

Read More »