Jun was a Fellow in our Winter 2017 cohort who has moved to Germany for a job with our hiring partner, Boehringer Ingelheim.
Tell us about your background. How did it set you up to be a great Data Scientist?
I have a background in applied mechanics and engineering. My Ph.D. research simulated the response of randomly structured material, from which I learned a lot about statistical analysis, numerical computing and model development. Moreover, my academic experience fostered in me a “curiosity in data”, which I think is the most important quality for a data scientist.
What do you think you got out of The Data Incubator?
During the program, I got the chance to learn what “data science” really is as an insider. In addition to those data analytics skills, I learned about how data science is applied in different industries, what qualities employers are looking for in a data scientist, what are the “front end” and “back end” of a data science project are and what are the associated skills with each stage. Only after those closer views, I can know what my strength and interest are and how I should prepare for my future career path.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
Start to learn Python and get your hands dirty with small projects including data manipulation, data visualization and etc. These can save you a lot of energy after joining the program. Also, take an introductory course on machine learning, you want to learn some data science language ahead of time if possible.
What is your favorite thing you learned at The Data Incubator?
I think “MapReduce” is pretty cool. It is amazing to scrape millions of pages in Wikipedia using google cloud.
Could you tell us about your Data Incubator Capstone project?
My project was to look at the advertisement campaign data from Facebook and build a predictive model for campaign budgeting optimization.
What technologies did you use and what skills did you learn at TDI that you applied to the project?
The project is built in Python. As a practical problem, you have to spend a lot of time on generating and selecting the most effective features for your machine learning model. There were many trial and error and I used ski-learn in Python for the extensive data exploration. I also used a lot of time series analysis.
How could others use your work?
Using the model, the company can adjust their budget optimally for the advertisement campaign and estimate how many are the possible feedbacks in the next period.
And lastly, tell us about your new job!
I am going to work in the data science team at Boehringer Ingelheim, which solves business problems, such as marketing and sales, for pharmaceutical industry.