Liang Shi was a Fellow in our Fall 2016 cohort in Washington, D.C. who landed a job with our hiring partner, Afiniti.
Tell us about your background. How did it set you up to be a great data scientist?
I obtained my PhD in turbulence theory at Max Planck Institute for Dynamics and Self-Organization in Germany. Afterwards I did a postdoc on turbulence modeling of atmospheric boundary layer flows at the National Institute of Standards and Technology. Both problems require extensive numerical simulations and data analysis using parallel algorithms and computing. The largest simulation that I have performed, ran on 5000 cpu cores for 3 months, generating around 10 Terabytes of data. These experiences gave me my first contact with ‘big data’ and equiped me with a toolset of data analysis. Most importantly, as a scientist, I am extensively trained on asking the right scientific questions, designing the experiments or simulations, using the good visualiztion tools to explore the data, and then giving nice presentations to deliver the findings. These are actually the essential savoir-faire to be a data scientist.
What do you think you got out of The Data Incubator?
Since I had always been in academia before, TDI is like a window to the industry, a bridge walking me smoothly from the academic world to the industry world. Through a series of activities like panel discussions and the alumni party, TDI offered me a great platform to know what kind of problems companies are trying to solve, what skills they are looking for, what the daily life looks like, etc. Moreover, TDI provides valuable guidance in the whole process of job searching, and last but not least, the chance to work with a bunch of very smart people.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
A general suggestion is to prepare the best as you can – “A good preparation is half the victory.” Read the TDI blogs, where you can find a lot of useful material and experience from alumni. If you are still using Matlab or Octave to analyze data and to make plots, switch to Python and be good at it. Think about a capstone project as early as you can and do not wait to work on them, because you will have very little time during the training.
What is your favorite thing you learned at The Data Incubator?
Sklearn and Spark are really cool tools for data science projects. I like the concept of ‘pipeline,’ flowing smoothly from data collecting, cleaning, to model building, and to model dumping.
Could you tell us about your Data Incubator Capstone project?
My capstone project aims to use data science tools to guide job seekers to find their job information more efficiently. Basically, I scraped the job-seeking website Indeed.com and displayed the jobs in a web dashboard, where you can filter the results according to your preferences. In combination with the data from crunchbase, the filtered results are then used to build a recommendation engine based on similarity score, suggesting companies that you may also like. The details and the source code of the project can be found in www.job-sniffer.herokuapp.com.
And lastly, tell us about your new job!
My current job at Afiniti, an AI ‘startup’ based in DC, is to build Bayesian models for caller-agent pairing in call centers. The center question is an optimization problem, how to make the pairs so that the gain is maximized. Traditional pairing strategy is first-in first-out (FIFO). However, this strategy is typically not optimized. The challenge is then to construct more efficient models, using machine learning technique.
Learn more about our offerings:
- Find out which program is best for you – will it be our Data Science Fellowship, our Data Analytics Program or our Data Science Essentials Course?
- Hiring Data Scientists
- Corporate data science training