Michael was a Fellow in our Winter 2016 cohort who landed a job with one of our hiring partners, Schireson Associates
Tell us about your background. How did it set you up to be a great Data Scientist?
My PhD work was in computational materials science, where I worked with reactive molecular dynamics simulations. The field is totally simulation based, and typically requires high performance computing resources. Running these simulations helped build my chops for working with parallel systems and command line tools. The software required familiarity with some powerful languages and APIs like C and CUDA. Learning those definitely helped my understanding of Python once I converted to using it.
Toward the third year of my PhD I got really interested in machine learning. I started using scikit-learn to predict different aspects of simulations I worked on. These projects became a large part of my thesis and contributed to choosing The Data Incubator as a next step in my career.
What do you think you got out of The Data Incubator?
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
I would start putting together projects in your free time, there are quite a bit of free data out there, and it’s easier than you think! You’ll organically learn solutions to common problems that may seem like esoteric solutions otherwise. I had put quite a bit of work into my project before I even applied to The Data Incubator, and it probably helped my application. Maybe more importantly, doing this allowed me to have a beefy project by the end of the program. I spent most of my interviews going over my project. I think it’s similar to what you’ll be doing in a working environment.
Finally, I can’t really emphasize enough how using the right libraries can be a huge time saver and productivity booster. Using a language like Python with a simple package management system made data science way more fun for me. Seriously, don’t try to use C for this stuff, it’ll take forever. Actually, go ahead and try it, what doesn’t kill you makes you stronger.
What is your favorite thing you learned at The Data Incubator?
My favorite software-based-thing to learn was Spark. I never really used a distributed file system so I had a lot to learn, and it was pretty powerful. I also really dug the Jupyter notebooks. I think I’ll be using them a lot in the future.
Could you tell us about your Data Incubator Capstone project?
My project had two parts–I compared historical rolling averages of stock price movements with phrases published in New York Times articles to build a financial sentiment lexicon, then I tried to use the lexicon to predict future price movements based on what was published in the New York Times. While I wasn’t able to perfectly predict the market (or I wouldn’t even need a job), I really enjoyed making the project. It gave me some natural language processing experience in conjunction with the mathematical modelling necessary for feature engineering the moving stock prices. It also gave me a lot to talk about during interviews.