Tying Together Elegant Models: Alumni Spotlight on Brendan Keller

Brendan was a Fellow in our Fall 2015 cohort who landed a job with one of our hiring partners, Jolata.

Tell us about your background. How did it set you up to be a great Data Scientist?

I did my PhD research in theoretical condensed matter physics at the University of California, Santa Barbara. The focus of my research was on studying the phase diagram of chains of non-abelian anyons. Because such chains are gapless in most regions of the phase diagram we had to model them using very large matrices in C++. To make this computation more tractable we used hash tables and sparse matrices. Besides my background in numerics I also took the time to learn Python, Pandas, SQL and MapReduce in Cloudera a few months before starting the fellowship.

What do you think you got out of The Data Incubator?

The Data Incubator gave me a solid foundation in data parsing, large scale data analysis and machine learning. I went into the fellowship already knowing about various concepts like SVM, bag-of-words and cross-validation. But I learned how tie these together into a elegant models that are both modular and easy to modify or upgrade. I also learned how to use Map Reduce on a cluster where the behavior of your program can be quite different then on a single node.

What advice would you give to someone who is applying for The Data Incubator, particularly someone with your background?

Get familiar with SQL, python and machine learning well before applying to the program. Also, get familiar with analyzing text data and learn how to deal with unicode.

What is your favorite thing you learned at The Data Incubator?

Learning MapReduce and Spark on clusters was particularly useful. There are some subtle differences between running your code on a single node vs a cluster which are important to know. The miniprojects are especially useful when talking to employers because they are typically looking for someone with background knowledge covered in at least one of the miniprojects (in my case recommender systems), which may not have been covered in the capstone project.

Could you tell us about your Data Incubator Capstone project?

There’s been a lot of hype around sensor data and how it could be used in wearable devices or smart cities. The aim of my projects was much more modest. I wanted to see if sensors installed on a product could be used to “review” it, just like customers do when they post an online review.

I looked at daily sensor data from 40,000 computer hard drives owned by Backblaze and compared their failure rate and life expectancy by model to the perceived failure rate and life expectancy obtained from scraping online reviews on Amazon and Newegg. Because there is approximately a linear correlation between star rating of the review and the perceived failure rate of a hard drive I was able to map the sensor data to an expected star rating for each hard drive model. This expected star rating represents the overall rating that the hard drive would receive it were rated by sensors rather then human reviewers.

And lastly, tell us about your new job!

At Jolata one of my main projects so far has been identifying one-way audio (OWA) in VoLTE calls. These are experienced by users as interruptions in cell phone conversation that either degrade the user experience or cause them to hang up. By finding regions where no packets are sent in one or both directions between the two users we can identify gaps in the VoLTE signal. Preceding and following these gaps are SCTP packets that allow us to classify the gap as either a true OWA or a normal operation such as a handover. One of my tasks as part of this project was presenting examples of OWAs in voice calls to our partners and ensuring that my statistical analysis would scale to the large volumes of data coming from four base stations. Currently I’m working on implementing a clustering approach to refine our classification of the gaps which may allow us to identify new types of OWAs outside of our current classification.
 

Learn more about our offerings:

Related Blog Posts

Moving From Mechanical Engineering to Data Science

Moving From Mechanical Engineering to Data Science

Mechanical engineering and data science may appear vastly different on the surface. Mechanical engineers create physical machines, while data scientists deal with abstract concepts like algorithms and machine learning. Nonetheless, transitioning from mechanical engineering to data science is a feasible path, as explained in this blog.

Read More »
Data Engineering Project

What Does a Data Engineering Project Look Like?

It’s time to talk about the different data engineering projects you might work on as you enter the exciting world of data. You can add these projects to your portfolio and show the best ones to future employers. Remember, the world’s most successful engineers all started where you are now.

Read More »
open ai

AI Prompt Examples for Data Scientists to Use in 2023

Artificial intelligence (AI) isn’t going to steal your data scientist job! Instead, AI tools like ChatGPT can automate some of the more mundane tasks in your future career, saving you time and energy. To make life easier, here are some data science prompts to get you started.

Read More »