Anthony was a Fellow in our Winter 2017 cohort who landed a job with one of our hiring partners, Afiniti.
Tell us about your background. How did it set you up to be a great Data Scientist?
I came into The Data Incubator with a Master’s degree in Computational Operations Research from The College of William and Mary. My Master’s program gave me a strong background in theory and in the practical application of machine learning, simulation, and optimization. I had a few internships as well, primarily in finance.
What do you think you got out of The Data Incubator?
The Data Incubator gave me a lot of experience handling data in a way that I didn’t get in an academic environment. The data sets were big, messy, and realistic. In addition, I thought that the capstone was an excellent way to get into a more industrial environment. The Data Incubator required a lot of database management, web scraping, and the like, which I didn’t get in the academic setting I came from
I also felt that The Data Incubator gave me a number of excellent opportunities. It may seem frustrating at times, but the partners really do want to hire Fellows, and The Data Incubator’s salary and compensation ranges are very accurate (in my experience). I’m not sure I would have gotten the same response rate and offers if I hadn’t been applying through the fellowship.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
Start the capstone early, work with the other Fellows, and don’t be afraid to ask for help. That’ll get you through 90% of the work. The other 10% is banging your head against the grader and figuring out which settings the grader is looking for.
When you’re applying to partners’ job posts, keep on them. I frequently had to pester the partners once or twice before they got back to me, but they almost always did. If they don’t, talk to the Partnerships team – they’re very helpful, and they have more leverage to get the partners moving again than you do.
What is your favorite thing you learned at The Data Incubator?
Distributed computing (Spark and MapReduce). I wish we’d done Spark earlier, and I wish we’d done more of it, but that’s not a mark against the program.
Could you tell us about your Data Incubator Capstone project?
I built a web app to scan documents for in-text citations, search a database for them, then generate a bibliography.
How did you come up with the idea for the project?
I went through grad school and really wanted to automate the citation process.
What technologies did you use and what skills did you learn at TDI that you applied to the project?
I built everything in python. I used Spacy and Textacy for text analysis, Flask for the website, D3 for visuals, API scraping, basic ML, and a number of other tools.
Describe the business application for this project (how could a company use your work or your data)
This is a consumer-facing project. If it was perfected and brought to market, I’m sure the demand among academics would be nontrivial.