Aleks was a Fellow in our Fall 2016 cohort in New York City who landed a job with our hiring partner, Via.
Tell us about your background. How did it set you up to be a great data scientist?
I was trained as a quantitative psychologist with a specialization in machine learning. In graduate school, I spent a lot of time thinking about experimental design and statistical analysis. My statistical toolbox emphasized frequentist and Bayesian approaches to hierarchical modeling. But I got exposure to a variety of methods like supervised and unsupervised machine learning, robust modeling, generalized linear and non-linear models, etc. I think the most useful things for data science were the most basic things: undnerstanding the linear and logistic models deeply, having a skeptical approach and maybe most importantly, being able to read and write math. The latter enabled me to quickly pick up new methods and read and understand relevat articles. Data science is a field of constant learning.
What do you think you got out of The Data Incubator?
The program puts you in touch with excellent employers and in many cases allows you to skip the resume-scanning stage of being considered for a job. It also introduces you to very talented people in a similar stage in their careers – a network of brilliant PhDs aspiring to be data scientists. Everybody has unique pieces of the illusive data science credentials and is eager to teach everybody else what they know. The weekly miniprojects were a great way to facilitate interaction and learning. Because the student pool is so diverse in terms of the origin disciplines, everybody has different strengths that you can learn from.
What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?
Round out your programming background and learn to manipulate larger data-sets. Applicants from the social sciences often have excellent statistical intuition, but are weak in terms of implementation. Use R for statistics. It’s a much better set-up to learn Python and other tools than the alternatives. I would also recommend trying to incorporate machine learning and advanced statistics in your research. It seems challenging to find applications at first, but there’s actually a lot to be done, academically, by using machine learning.
What is your favorite thing you learned at The Data Incubator?
The data incubator exposed me to the ideas behind key frameworks in the streaming architecture domain and gave me some hands-on experience as well. This is a really important area of data science. Spark is very cool, as well. It’s a technology that allows a relatively painless way to scale up machine learning and data manipulation routines across multiple machines.
Could you tell us about your Data Incubator Capstone project?
I created a tool that was able to predict which police interactions will result in arrests. The idea was to facilitate racially equitable predictive policing. This would allow police officers to avoid unnecessarily harassing innocent people and concentrate on cases that are likely to involve wrongdoing. Importantly, it did not disproportionately target minority suspects. If police officers use this tool, they can reduce the number of innocent people being stopped by 75%, while reducing arrests by only 25%.
How did you come up with the idea for the project?
The New York City Police Department, for all its setbacks, provides an excellent data source that details police interactions. Not hundreds or thousands of rows, but many gigabytes of data. I thought there must be a good use for all these data.
What technologies did you use and what skills did you learn at TDI that you applied to the project?
TDI introduced me to some excellent practices in machine learning. One key insight is that it helps to cross-validate across models, not just across parameters.
What was your most surprising or interesting finding?
I was honestly very surprised by how well the model did. In retrospect, this is less surprising, since police officers often given ridiculous reasons for stops, basing them on nothing but the suspect’s clothing.
Describe the business application for this project (how could a company use your work or your data)
The tool is intended for a police department to employ in order to reduce bias and increase effectiveness in the interactions that officers have with suspects.
Do you have an interesting visualization to share?
I like this plot because it highlights that predictive policing can be a force for reducing, rather than perpetuating racial disparities. Although the proportion of innocent people stopped was the same across groups, the tool would reduce the number of stops the most among suspects classified as Black.
And lastly, tell us about your new job!
I’m a data scientist at Via and I’ve been here for over a year now. I’ve got a great team and work on some very difficult problems. I encourage all fellows to apply!
Learn more about our offerings: