Data Science Project Ideas

We love data science and cool data science projects.  If you’re a applying for our free data science fellowship and looking to propose a data science project, here are four project ideas.


GitHub is a great source of data on how engineers write code.  A recent post found discrimination against Pull Requests submitted by women on GitHub, although perhaps that study could have been better.  But there are lots of other ideas to pursue.  We can easily learn an n-gram classifier on whether a line of code is a comment or not and search for commented out code.  Are repos by academics more likely to have commented out code?  Are they more likely to violate lint rules?  Additionally, it would be interesting to analyze commits that are in response to bug fixes to predict in which lines of code bugs are more likely to occur.

Open Food

Ever what makes Mexican food unique or what’s distinctive about Polish cuisine?  There are plenty of recipe websites (,, with ingredient lists.  You could easily run PCA, K-Means,  or your favorite clustering algorithm or a classifier on ethnically identified dishes.  Can you combine this information to find an “Eastern European” Ingredients” eigenvector?

Open Drinks

If you’re interested in cocktails,’s ingredient lists are event hyperlinks and cross-referenced for you.  You could easily use SVD or other recommendation engine techniques to find cocktails that are similar to the ones you already drink.  Cocktails are suppose to have a balance of the five basic tastes.  Drink Mixer actually gives you the nutritional information to break information down.  Connoisseurs of beer know that has very in depth beer reviews, often containing thousands of reviews per beer.  You can use NLP to find similar beers?

NYC Taxi Data:

There’s plenty of analysis of NYC Taxi Data but it’s often about optimizing fares or finding which street to hail a taxi on.  But there’s a tonne of sociological data to be unlocked.  Where do the Bridge and Tunnel Crowd go on Friday or Saturday Night?  Where do theatre or symphony goers head home to after their performances?  Where the bankers go to eat or sleep after work?  Where do the consultants, who fly every Sunday evening and arrive back in town Thursday evening live?  What are the most popular hotels amongst Amtrak travelers?  What about flights?  Where do tourists go after hitting up the MET museum or Statue of Liberty?  Can you companies understand where their customers live?

Related Blog Posts

data science portfolio

How to Build a Strong Data Science Portfolio: 5-Step Guide

So you want to be a data scientist? Great choice! Data scientists are still the hottest jobs around. But before you can start applying for data science jobs, you need to build a strong data science portfolio. A data science portfolio is a collection of your best data science projects that demonstrate your skills and abilities.

In this blog post, I’ll provide a 5-step guide on how to build a strong data science portfolio.

Read More »
imposter syndrome

Impostor Syndrome in Tech: What It Is, Why It Exists, and How to Overcome It

Impostor syndrome isn’t experienced in just certain industries or disciplines or only by certain individuals. It’s much more widespread than you may think. If you’re in the technology field, you may be familiar with this sentiment, but maybe you’ve never heard the term impostor syndrome. So, what exactly is impostor syndrome? What causes it? And how do people in data science, the tech field or STEM industries overcome it?

Read More »