Matt was a Fellow in our Winter 2016 cohort who landed a job with one of our hiring partners, 1010data.
Tell us about your background. How did it set you up to be a great Data Scientist?
I defended my PhD dissertation at Washington University in St. Louis, a few weeks before coming to The Data Incubator. I was part of the MAPLE lab in Energy, Environmental, and Chemical Engineering (I know, it’s a mouthful). Our lab focused on physics-based electrochemical modeling, mostly geared toward Li-ion batteries.
For my main dissertation project, I studied how batteries age under different real-world cycling patterns. Most cycle life estimates for a battery are based on simple constant charge and constant discharge patterns, but lots of applications (such those experienced by batteries in electric vehicles or coupled to the electric grid) do not have simple cycling patterns. This variation effects the life of the battery.
Both through model simulation and long-term experiments, I had to analyze battery characteristics over thousands of cycles and pick out important features. This type of analysis along with programming computational models that were used to create these data sets helped give me a background to tackle data science problems.
Additionally, I think that working on my PhD projects allowed me to gain experience in solving unstructured problems, where the solution (and sometime even the problem/need) are not well defined. these type of problems are very common, especially once you get outside of academia.
What do you think you got out of The Data Incubator?
More than I can fit in a couple of paragraphs! The most important thing for me was learning all the functionality of different programming languages and packages. Coming from a background where I had programmed in Maple, VB, and a little bit of Matlab and SQL, learning Python (and all of its different packages), Spark, etc. opened up so many possibilities for doing new types of analysis. Knowing these tools, greatly sped up my ability to conquer new problems.
Completing miniprojects on each subject was instrumental in feeling confident about applying the techniques we learned in real-world situations. It was definitely a pressure packed environment trying to complete everything on time, but it forced you to know each subject inside and out. Looking back on the program it’s amazing to look at the amount of code you have produced.
Beyond the subject matter, working together with so many other driven people was a great experience. And the network of employers that were brought through the program for happy hours and panel discussions always helped showcase all the different ways data science is being used in industry. I made my first connection with 1010data (where I will be starting a job at the end of the month) at one of the happy hours. So I think those were pretty valuable!
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
Try to brush up on new languages that you have not used much before. In academia, you may have been working in the same environment (maybe even the same code!) for several years. Now is the time to try some new things and test out your ability to be flexible when solving a problem. There are a lot of great resources out there from Coursera to Code Academy that are helpful for learning new topics.
If you make it to the coding challenge, try to complete the problems with a new language. If you are used to Python, try R. If you have not used SQL and you think it would be useful for the problem, go for it. Everyone applying to the program has skills to complete at least some of the coding tasks. If you want to differentiate yourself and show the breadth of your knowledge, use a different language or technique for different problems.
Being able to pick up new techniques quickly will serve you well at both the Incubator and in your future job.
What is your favorite thing you learned at The Data Incubator?
While it is not the most technical of aspects covered, I really enjoyed learning how to set up web applications and create good visualizations. A lot of this came from working on the Capstone project where you need to have a functional (and pretty) website to showcase your work. I think, one aspect that a lot of technical people struggle with is showcasing their work. If it’s hard for someone to understand your project, they will not be able to appreciate the technical details and effort that went in to the project. Being able to create simple and effect presentations is sometimes as important as the work itself. But I’ve seen lots of people work for months on a project and then just slap together a sloppy presentation in day.
It’s great to have learned how to put together several different topics studied at the Incubator into a working website. And it is even better when I can show my website to my (non-technical) friends and family members and they can have a decent understanding of my project.
Could you tell us about your Data Incubator Capstone project?
My project focuses on analyzing energy price and wind speed data in the Midwest to help determine the best locations for new wind farms. Conventional wisdom says that wind farms should be located in areas with the highest wind potential so that they can produce the maximum amount of energy. However, these locations are often in sparsely populated regions. These two factors mean that when wind farms are producing large amounts of energy and demand is low (and there is not sufficient transmission to push that energy elsewhere) wholesale energy prices will be driven down lowering profits. At some points the value of wind energy can be reduced by 40% because of these effects.
You can check out the analysis at my webpage: miso-epat.herokuapp.com/wind_details. The site allows for you to generate wind reports for different areas and study energy prices around the Midwest.
By studying the real-time energy prices and wind speed, the project was able take a more thorough approach to determine which locations were best for new wind farms, which required new transmission lines to maximize their potential, and which areas should be avoided.
If you are more interested in some of the technical aspects of the project, the website runs on Python using Flask and utilizes a SQL database to hold the wind speeds and energy prices for all of the nodes with data from every hour for the last four years. The data used was from the Midcontinent Independent System Operators RTO historical price database and the wind speeds were taken from NOAA’s climate database using rural weather stations close to the studied nodes. For more info, check out the methods section of the website (http://miso-epat.herokuapp.com/methodology)