Data Science Growth with Becky Tucker from Netflix
This content comes from a transcript of recent conversation with Dr. Becky Tucker on the TDI DS30 Podcast. > Listen Now
I know when I say “Netflix and data science” the first thought people have is “recommendations algorithms,” but I am going to take this conversation beyond recommendations.
Netflix is an extremely data-driven company and we apply data science to every aspect of the user experience, including content, which is one of the more interesting and more nuanced ways that we’re using data science.
A Little Bit About Netflix Today 
We have 93 million members worldwide. So, that says a little bit about the amount of data that we could be dealing with for any particular algorithm or analysis. We do support a lot of devices. Netflix is available on laptops, smartphones, gaming, consoles and smart TVs.
You can now view Netflix basically anywhere in the world except for China, North Korea and Syria. This also includes a change in some of our original programming strategies where we are also now producing Netflix originals for non-American markets. Such as France, Mexico or Terrace House in Japan.
There are more than 3 billion hours watched per month, which to us translates to more than a hundred billion events logged every day. All of this is just to bring home the fact that we are living in a really big data world.
Data Science Growth Before Streaming
Just as Netflix has evolved over the years, data science at Netflix has also evolved over the years. Our earliest data science efforts were focused on recommendations.
Back then, there wasn’t any streaming or play data to directly use as a signal to recommend and suggest titles to a user. So, we used information like the star ratings to make recommendations to people.
Additionally, back then, there was essentially no data science for content demand. During the DVD days, essentially what we would do is put everything that we knew about in the world, up on the website. If enough people added it to their queue we would go out and buy the DVD. We didn’t actually have to have the DVD in order to let people add it to their queue. When it reached a certain threshold, we would just go out and buy it.
Data Science Growth After Streaming
Everything changed when Netflix started streaming. We started making recommendations based completely on streaming data. This culminated in the Netflix prize where we challenged people to take our existing personalization algorithm and to beat it by more than 10% and the winning team would get a million dollars. It was a cool era to be working on this.
I think one of the things that happen when you’re doing data science in the real world, is that there is a trade-off between accuracy and functionality.
The final winners of the Netflix prize had an ensemble model that had more than a hundred individual models. It couldn’t be productionalized. One of the things that you have to consider is that data science in the real world is not just about accuracy.
It’s also about latency and ease of maintenance and ease of production. You might find that you have an algorithm that is in fact worse, maybe 1% worse, but if it’s 10 times faster, that’s a trade-off that you might be willing to make.
It’s not to say that we haven’t made a lot of progress since then; the algorithms and machine learning behind recommendation and personalization today, are in fact incredibly sophisticated and nuanced.
Beyond Content Recommendations
Personalization today is not just a single algorithm. There are algorithms that determine not only what content gets displayed to you in which rows, but which order those rows appear on your page. The data teams here touch essentially every part of the Netflix experience.
We A/B test what images are placed for each piece of content. Data science also touches on streaming and quality of experience. So if we know that you’re streaming on a poor internet connection, we’ll change the encoding algorithm so that you get fewer messaging and discovery queries.
So how we email you when a new title becomes available, all of that is touched by data science at Netflix and it’s all geared towards trying to make great recommendations of great content. One of the fundamental truths here is that in order to recommend great content, you have to have great content in the first place.
We can only recommend the titles that we actually have available on the service unlike in our DVD days. So the question here becomes what titles can we either buy or create to optimize our users’ joy.
That really is the metric we think about at Netflix, is what brings people joy.
That of course, highly correlates with things like growth and retention and business-oriented metrics. But the question becomes, what do we produce or buy? Two pieces to that puzzle are our original content and our licensed content.
Assigning Value to Content
Netflix has a lot of ways to consider what makes content valuable. The simplest version of this is how many hours was it viewed?
This becomes tricky really quickly because defining a metric the wrong way can lead you to really bizarre incentives. With something like how many hours something is viewed, there are ways that you could start changing the kinds of content you make available to optimize for that metric. That is not good for the service in the long run.
For example, you may find people view more hours of longer content. Instead of making two-hour movies, we should make six-hour movies or eight-hour movies.
We can assign value by asking other questions like:
- How many people finished watching it?
- Did people sign up for Netflix to watch it or did it generate new subscribers?
- Did it win awards? How do you value the contribution of winning an Emmy or an academy award?
- Is it popular with critics?
- Is it binge-worthy?
- Is it a cult favorite, like Arrested Development or the reboot of Wet Hot American Summer?
So given that we have so much more data we can also construct more elaborate metrics. We can actually look at all of these things when we try and think about what makes content valuable.
Measuring Content Efficiency
One thing that you might hear sometimes in terms of Netflix’s notion of value is this idea of content efficiency.
It is the value of a piece of content to Netflix divided by its costs. If it’s over one, then we’re getting more value than what it costs us. If it’s less than one we’re getting less value than what it cost us.
Luckily for us, our original content is in fact, some of our most efficient content. That ends up being a good thing because it means people come to the service for content from Netflix. If it’s only available there, you’re probably more likely to stick with the service.
Predicting Demand and Predicting Value
Given this context, I’d like to talk a little bit more about how we actually predict demand and predict a value for a piece of content from Netflix. I’m going to start with the case when something is licensed.
Licensed means non-original content has been made by a studio separate from us.
We’re just acquiring the streaming rights. What we do is we acquire a database of essentially all available titles that we’d like to make a prediction about. This becomes a massive data acquisition and engineering problem.
You’ve probably heard the truth at this point that 80% of data science is actually data engineering and data cleaning. That’s certainly true for what we do. Our data engineering teams are really the backbone of data science at Netflix.
It is a massive data engineering and sometimes even data science challenge just to do the entity resolution.
For example, the movie named “Frozen.” If you don’t successfully differentiate between things, you’ll be talking about a small-budget horror film set in the mountains versus a massively popular Disney movie about two princesses.
You can imagine predictions we might make about a title named Frozen, which would be incredibly garbled and unclear. You can’t do this by simple string matching.
There is actually an entire body of literature on how you take different data sources that may or may not be talking about the same entities and merge them together in the best possible way. This is a moment in which I would like to plug the fact that the data engineers and data engineering teams at Netflix are spectacular. They do a really great job of making the data scientist job easier.
So once we have conquered the data acquisition and cleaning challenge, we start putting together our demand features. Essentially we try to find anything that we think might have some signals so that would include past performance on Netflix.
If we’re looking to license a horror film, we would look at how other horror films have done. We could include things like broadcast ratings, the box office, the talent, reviews and awards.
If we can get our hands on it, we will try to include it in our predictive models. I’m sorry to say that I can’t actually get too much into the details of what specifically we do in those models.
We are using everything from your basic regression models, gradient boosted decision trees, a lot of matrix factorization methods, clustering, LDA and NLP techniques. Someone somewhere at Netflix is using these methods to help make the Netflix user experience better.
Our value prediction problem for originals is a much more difficult problem than the licensing value of the prediction problem because there is less data.
If something isn’t created, we have no box office or review data to use. It’s a moving target. Ideas and scripts can change wildly between when they’re pitched and when they’re actually made. The execution can vary widely with talent and budget.
One thing that we do is try to find comparable titles and use what we know about those titles in order to predict how we think an original title might do.
Maybe when we were originally looking at the House of Cards script, we say, “well, this is kind of like the West Wing meets Breaking Bad.” Those are titles we’ve actually had available on the site. We know that the people who watch those titles also watch Mad Men and the Blacklist and Homeland.
Then we might make predictions on the basis of how well these comparable titles perform with adjustments. From a domain expert who might know more about this process. We are trying to inject more science into this process via two techniques, matrix factorization of play data and natural language processing.
This is as much of an art as a science.
For example, you know, if you ask an algorithm without having some additional nuance or understanding of what’s going on here, what the title is comparable to Twister and Jaws, you might end up with something like Sharknado, which is I’m sure a high-quality movie. You might not consider it to be quite the same level of production as movies like Twister and Jaws.
The Importance of Combining Data with Domain Expertise
Once we get a prediction for a piece of content it doesn’t stop there. I think this is another one of the truisms about data science in the real world which is that domain expertise matters. It is not enough to just take your data and run it through a machine-learning algorithm.
There is a lot that isn’t captured in your data. There’s a lot that can’t be captured in a model. In addition to that, we’re still working within a really old Hollywood system. So sometimes you may get a prediction and based on the cost you might say, well, that’s inefficient, we shouldn’t buy it.
But it might be a part of a larger deal where if we purchase one film we also get another film. We might also make certain decisions to build a relationship with a studio or with an actor or director. Buying deals and streaming rights are incredibly complicated and nuanced.
The predictions are the beginning of the conversation not the end of it.
Netflix’s Multi-layered Data Science Approach
One of the things that we have had to address at Netflix as part of content demand, is that we don’t have a single content demand model. We actually have many content demand models similar to the fact that we have many recommendation models and many personalization models.
In order to deal with that, we actually ended up building a custom machine learning framework in order to address both the multiplicity of models and the fact that we’re dealing with big data, things that won’t just fit on a laptop.
We have a YAMML-based machine learning framework, which stands for yet another machine markup language. Essentially, what it allows us to do is to write config files that specify and separate feature engineering for training. The algorithm and model that we’re using for training feature engineering for scoring.
One of the really cool things about this framework is also that it’s inherent. This means that if you have a base model you can have sub-models that inherit from that. If you want to add a feature that turns out to help a lot in the prediction problem, you can just add it to the base model and that feature will flow through to all of the subsequent models without you having to go through and change each one.
I know we’re hoping to open sources at some point, but it is not there.
Can Data Science Create Content?
Do we have an algorithm that just spits out scripts or tells us yes or no? Whether or not a script is good and for example. There was an AI screenwriter who wrote a script that got made into a film.
It’s available online. It’s entertaining, you wouldn’t exactly call it great content. You can watch it here.
Is Netflix doing this? No, as I mentioned before, the way that we handle content prediction is that that’s where the conversation starts. That’s not where the conversation ends.
We really do pride ourselves on giving a lot of creative freedom to creative people and getting out of their way to let them do their job. But data can definitely help in choosing content. When we think about both what we’re doing and what’s next in terms of using data science to create content we’re thinking about how we can use data science to optimize the catalog.
Additionally, we are also thinking about, how do you identify valuable content earlier in the pipeline? This is really about giving our creatives better tools and letting them do their job.
More about Dr. Becky Tucker
Dr. Becky Tucker is a senior data scientist at Netflix based in Los Gatos, California.
She works on the content science and algorithms team, which is located in Lausanne. She holds a PhD in physics from Caltech. At Netflix, Becky works on models that predict the demand for TV shows and movies
I’m intensely curious and always learning. I’m passionate about both data science and the entertainment industry. My philosophy is that data science should result in high-impact, actionable results that are clearly communicated to decision-makers and stakeholders.
I did my PhD in observational cosmology at Caltech, where I built microwave telescopes to study the cosmic microwave background.