We love data science and cool data science projects. If you’re a applying for our free data science fellowship and looking to propose a data science project, here are four project ideas.
GitHub
GitHub is a great source of data on how engineers write code. A recent post found discrimination against Pull Requests submitted by women on GitHub, although perhaps that study could have been better. But there are lots of other ideas to pursue. We can easily learn an n-gram classifier on whether a line of code is a comment or not and search for commented out code. Are repos by academics more likely to have commented out code? Are they more likely to violate lint rules? Additionally, it would be interesting to analyze commits that are in response to bug fixes to predict in which lines of code bugs are more likely to occur.
Open Food
Ever what makes Mexican food unique or what’s distinctive about Polish cuisine? There are plenty of recipe websites (allrecipes.com, foodnetwork.com, chowhound.com) with ingredient lists. You could easily run PCA, K-Means, or your favorite clustering algorithm or a classifier on ethnically identified dishes. Can you combine this information to find an “Eastern European” Ingredients” eigenvector?
Open Drinks
If you’re interested in cocktails, drinksmixer.com’s ingredient lists are event hyperlinks and cross-referenced for you. You could easily use SVD or other recommendation engine techniques to find cocktails that are similar to the ones you already drink. Cocktails are suppose to have a balance of the five basic tastes. Drink Mixer actually gives you the nutritional information to break information down. Connoisseurs of beer know that BeerAdvocate.com has very in depth beer reviews, often containing thousands of reviews per beer. You can use NLP to find similar beers?
NYC Taxi Data:
There’s plenty of analysis of NYC Taxi Data but it’s often about optimizing fares or finding which street to hail a taxi on. But there’s a tonne of sociological data to be unlocked. Where do the Bridge and Tunnel Crowd go on Friday or Saturday Night? Where do theatre or symphony goers head home to after their performances? Where the bankers go to eat or sleep after work? Where do the consultants, who fly every Sunday evening and arrive back in town Thursday evening live? What are the most popular hotels amongst Amtrak travelers? What about flights? Where do tourists go after hitting up the MET museum or Statue of Liberty? Can you companies understand where their customers live?