Yutong was a Fellow in our Spring 2016 cohort in New York City who landed a job as a Machine Learning engineer at Apple.
Tell us about your background. How did it set you up to be a great Data Scientist?
My background is in computational physics. Throughout my education and during my doctoral research, I found that I have a great interest in data data analysis and the beauty of using models to predict things. So I decided that I wanted to be a Machine Learning Engineer, and a Data Scientist.
What do you think you got out of The Data Incubator?
I think the most important thing is the Capstone project that I did during the Fellowship. I learned a lot of things from doing the Capstone protect, both from self-learning and also the Fellowship lectures and projects. So I think the Capstone project is the most important thing.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
First, try to do some personal projects before applying for The Data Incubator Fellowship – these projects should be related to either machine learning or big data analysis – so that while doing the Capstone project you can have a better understanding of what kind of knowledge you need to have to become a Data Scientist.
What is your favorite thing you learned at The Data Incubator?
I was able to learn a lot of things quickly and implement them into my capstone project. Instructors were always available, as well as other students in my cohort and it was very beneficial to receive feedback on the project to know which parts I’m doing well and which parts I still need to improve.
Could you tell us about your Data Incubator Capstone project?
My Capstone project involved using NYC taxi trip data to help you plan and predict your business based on taxi trips. Part of the project was visualization, so you can visualize the trips and get some good insights that way. And then it also has a prediction part, which you can actually implement my engine to help predict the duration of a trip, and you can also find the best pick up locations.
How did you come up with the idea for the project?
I’ve always been interested in traffic problems because it’s a very dynamic system. It has a lot of data sources and the information is based on both temporality and location. So, I’ve always found it an interesting problem to analyze.
What technologies did you use and what skills did you learn at TDI that you applied to the project?
The first and, probably, most important thing is Spark, which I used for visualization and model training. I also used some databases, like PostgreSQL. I set up the web page with Flask, which I also learned in the Fellowship.
What was your most surprising or interesting finding?
Just using a few features extracted from NYC taxi trip data can actually help us to predict our estimated time of arrival. The accuracy was very good, which I was kind of surprised by.
Describe the business application for this project (how could a company use your work or your data)
The most straightforward use case is to predict estimated time of arrival in NYC, which can be very useful for a lot of companies such as Uber, Lyft, Via, etc. that are dependent on providing an estimated time of arrival of each trip. These predictions are very important for high demand analytics at this kind of company.
Do you have an interesting visualization to share?
This map (above) shows the pickup and drop off data for both NYC Taxis, as well as Uber rides. I also made a timelapse map visualization, to show the dynamic changing of pickup and drop off locations in the city (below).
And lastly, tell us about your new job!
I’m working as a Machine Learning Engineer at Apple, and my job is to use machine learning to help Apple improve its products. So far, I’m absolutely loving the experience.
Learn more about our offerings: