David was a Fellow in our winter cohort who landed a job with one of our hiring partners, Sotera.
Tell us about your background. How did it set you up to be a great Data Scientist?
I did my PhD in computational chemistry, which, as the name suggests, focuses on computer simulations of chemicals. The difficulty in computational chemistry doesn’t come from a lack of knowledge of the underlying physics. Rather, the difficulty comes from computational complexity – 100 year computer simulations aren’t an effective way to obtain a PhD. Computational chemistry does a great job of training researchers to think about how their computer simulations really work, how to approximate the really hard parts, and when those approximations do (and don’t) work. Those are all really important skills for a data scientist confronted with a large data set distributed across multiple computers (aka “Big Data”).
What do you think you got out of The Data Incubator?
What advice would you give to someone who is applying for The Data Incubator, particularly someone with a Chemistry background?
Chemistry is, as far as I can tell, a less typical background in data science. That doesn’t mean chemists don’t have the right training! Computational chemists are taught both quantum mechanics and statistical mechanics. Statistical mechanics uses, as the name suggests, statistics, and don’t forget that in quantum mechanics the electron density is formally the electron probability density. Almost everything in quantum mechanics involves taking the expectation value of a probability density.
Chemists have the necessary math background. We also have the necessary computational background. If you have ever programmed a simulation with more than one computer core you already know some parallel computing. If anything you might know too much; the message passing idioms used in computational chemistry are a lot more complicated than MapReduce. Just brush up on analyzing algorithms and big-O notation – chemists have the tendency to abuse the notation a bit.
So my advice to a chemist applying to The Data Incubator would be: don’t worry, you have the necessary background. It certainly wouldn’t hurt to brush up on your probability theory or algorithms 101, but you already know the basics. [Editor’s Note: For more information about how to prepare for The Data Incubator, check out this post.]
Could you tell us about your mini projects you worked on?
The mini-projects are a wonderful way to be exposed to the wide array of different types of problems a data scientist might face. Our mini-projects covered diverse topics such as social networks, natural language processing, distributed computing, and more. Completing the mini-projects gives you a well-rounded data science background. I know during my interviews that the mini-projects gave me a great way to demonstrate that I had experience in multiple areas of data science.
It’s also important to note that the mini-projects are not trivial and I learned a great deal by discussing them with the other Fellows. The Fellows came from diverse backgrounds and having people with so many different ways of looking at a problem working together always produced interesting results.