Real data scientists have a rare hybrid of skill sets: Here’s what to look for

On July 18th, 2015, an article Michael wrote was featured on VentureBeat. The full text can be found below and where it was originally posted here

Over the course of the last year I’ve spoken with hundreds of employers interested in hiring data scientists – in particular, data scientists with advanced educational degrees. Many employers and hiring managers have heard that big data is the “hot new thing.” But as with all “hot new things,” there’s as much misinformation about data science as there are facts. Here are three misconceptions about big data and data science that I often encounter:


1. Big data is statistics and business intelligence with more data. There’s nothing new here.

This is a view often held by those with limited or no software development experience and it is plainly false. The perfect analogy for this is ice. Ice is just cold water right? There’s nothing new here. However, cooling down water doesn’t just change a quantitative property (temperature) but drastically changes its qualitative properties (transforming a liquid to a solid). The same can be said of more data. Big data strains and ultimately breaks the old paradigms of computation. With big data, all the data cannot fit into RAM and the traditional BI calculations would take years to complete. Parallelization and distributed computation are obvious answers to scaling, but this is not always easy: Even a simple statistical tool like logistic regression does not easily parallelize. Distributed statistical computation is as different from traditional business analytics as ice is from water.


2. Data scientists are just rebranded software engineers.

Sometimes engineers with strong software development backgrounds will rebrand as data scientists for the salary premium. This can lead to subpar results. At the simplest level, debugging stats bugs becomes much harder. Engineers are trained to spot and solve programming bugs. But without a solid background in probability and statistics, they often have a hard time solving statistical bugs. Your code might be just fine but if you didn’t reweight your training examples correctly, your predictions will be off.

At a higher level, engineers are well trained to build simple discrete rules-based models. But these models are ill-suited to derive the more subtle insights from continuous-valued data and are leaving money on the table. Solid statistical chops are necessary to overcome these challenges to build the next generation of scalable predictive models.


3. Data scientists don’t need to understand the business, the data will tell you everything.

People with machine-learning backgrounds often succumb to this one, in part because machine learning is so powerful. But it is not omnipotent. Searching for all possible correlations is time consuming, not to mention statistically problematic. Data scientists need to be guided by business intuition to help them distinguish between spurious correlations and real ones. Lack of domain expertise can lead to ill-founded conclusions (“more police officers leads to higher crime rates”) that prompt bad policy recommendations (“cut the policing staff in high crime neighborhoods”). Finally, having business intuition is also important for convincing key stakeholders. These stakeholders might not be data scientists but are usually domain experts: Talking about your correlations in a language they can understand is key to getting the kind of institutional buy-in that is necessary for data science to achieve its promise.

Big data and data science is about building the right model that combines the right engineering, statistical, and business skills. Without all three, your data scientists will not be able to achieve everything they set out to do.

For more information on hiring a data scientist or becoming a data scientist, visit The Data Incubator’s website.

Related Blog Posts

Moving From Mechanical Engineering to Data Science

Moving From Mechanical Engineering to Data Science

Mechanical engineering and data science may appear vastly different on the surface. Mechanical engineers create physical machines, while data scientists deal with abstract concepts like algorithms and machine learning. Nonetheless, transitioning from mechanical engineering to data science is a feasible path, as explained in this blog.

Read More »
Data Engineering Project

What Does a Data Engineering Project Look Like?

It’s time to talk about the different data engineering projects you might work on as you enter the exciting world of data. You can add these projects to your portfolio and show the best ones to future employers. Remember, the world’s most successful engineers all started where you are now.

Read More »
open ai

AI Prompt Examples for Data Scientists to Use in 2023

Artificial intelligence (AI) isn’t going to steal your data scientist job! Instead, AI tools like ChatGPT can automate some of the more mundane tasks in your future career, saving you time and energy. To make life easier, here are some data science prompts to get you started.

Read More »