It’s that most magical time of the year. Cheeks are 23% rosier week over week, sleigh-bell tinkling is up a remarkable 285% over the previous month, and Santa’s elves are busily vacuuming their Christmas-wish databases. And all of us are trying to find that perfect holiday gift.
While some people may go by “intuition” or “gut feel”, we data scientists know that gift-giving is a precise analytical task best attacked with loads of data and powerful algorithms. Of course, data science is not magic, so these tools do come with their drawbacks. In this article, we’ll present some of the best algorithms for gift selection, along with a few words of caution.
Linear models are a go-to for starting any data science project. They are simple and train quickly. Despite this simplicity, they can be fairly powerful, especially when provided with a large number of features. There is relatively little to tune: perhaps a regularization parameter or two. Linear models for nice baselines, letting you make guesses about how more complex models may perform.
Linear models are also uniquely interpretable. Not only are the relative importance of features easy to see, the impact of each feature is independent. When your loved ones look at your gift and ask, “Why?”, you can answer with confidence and confidence intervals.
Downside: Linear models struggle to extrapolate outside of their training data.
Boosted Decision Trees
Linear models not enough to untangle the complicated psyches of your loved ones? Boosted decision trees are powerful and can handle a wide variety of data. The nonlinear behavior of decision trees allows them to notice interactions between features, and the gradient boosting technique allows models to fit complex patterns.
For these reasons, XGBoost dominates Kaggle competitions. These models have been well tuned for performance, and tricks like early stopping mean training can be remarkably efficient, even with the additional predictive power. Even with all of this, simple heuristics can still provide some sense of feature importance. Boosted decision trees are good all-around models, so pull one out this holiday.
Downside: Unless care is taken, boosted decision trees have a tendency to overfit.
Need even more predictive power? Neural networks are the leading tool in terms of predictive power. With designs based on human brain structure, they may be the best tool to figure out what’s going on in your loved ones’ heads.
Yes, neural networks need a lot of resources, but modern GPUs are quite powerful. Your friends and family are worth the investment of time and energy to produce the best presents possible. Yes, some people may worry about black-box models, but you’ll be wrapping them no matter their color.
Downside: It can be difficult to understand why a neural network made a specific prediction.
Time Series Analysis
As data scientists, we know that it’s not enough to just choose a model. We also have to choose the appropriate data analysis techniques for the problem in question. In the case of holiday presents, a time series analysis is appropriate. “‘Tis the seasonality,” as they say!
Of course, a first step in this analysis is to remove the seasonal terms. To our surprise, we’ve found that, once this is done, there is very little gift-giving left in the residual. Given this, tools like ARIMA should be able to fit the remaining gift-giving and and the reactions thereto.
Downside: ARIMA can predict recovery from shocks, but not the shocks themselves.
Despite our pleading, most of our friends and family refuse to live their lives in sterile sensory-deprivation chambers. As such, it is difficult to tell if their change in happiness was due to the gift we gave, or if it was caused by some other factor.
The gold standard for teasing causality out of correlation is A/B testing. The approach is relatively simple: Divide your recipients into two groups, identical in every way. (You may need to acquire additional recipients in order to ensure statistical significance.) Then provide different gifts to the two groups. Differences in their reaction are therefore due solely to your gifts. Enjoy the certitude of knowing exactly how much you matter!
Downside: You may need to keep the two groups from finding out about each other.
From The Data Incubator family to yours, may your holidays be at least one standard deviation above your baseline festivity and brightness.
Reminder that we’re always here to guide you through your data journey! If you have any questions, please contact our admissions team to chat about how we can help you.
And if you want to learn how to be this funny with data science models, make sure you apply to learn from the instructors who can teach you both how to perfect your models and have some humor doing it.
About our Authors
Data Scientist in Residence at the Data Incubator
Robert studied squishy physics in Chicago, Amherst, and Santiago, Chile, before uniting his love of computers, teaching, and making pretty graphs at The Data Incubator. In his free time, he plays tuba and right field, usually not simultaneously.
Data Scientist in Residence at the Data Incubator
Rich moved from particle physics to data science when he left academia, and is excited to be joining his interests in data and programming with his love of teaching. In his spare time, he’s a fan of science, speculative fiction, board games, and hiking.