
Spark comparison: AWS vs. GCP
This post was written collectively by Michael Li and Ariel M’ndange-Pfupfu. The original post for this piece can be found at O’Reilly. There’s little doubt that cloud computing
This post was written collectively by Michael Li and Ariel M’ndange-Pfupfu. The original post for this piece can be found at O’Reilly. There’s little doubt that cloud computing
StatsModels & Scikit-learn are two popular packages for working with stats and machine learning in Python. Learn more about each from The Data Incubator.
SQLite and pandas are two common data manipulation tools, but SQLite selects and filters data faster while pandas joins and loads data faster.
It’s that most magical time of the year. Cheeks are 23% rosier week over week, sleigh-bell tinkling is up a remarkable 285% over the previous month, and Santa’s elves are busily vacuuming their Christmas-wish databases. And all of us are trying to find that perfect holiday gift.
Did you know that Python has two ways of measuring “sameness”? Read on to learn what equality vs identity is and which one to use.
The shape of code, or what we can learn from indentation. As a TDI data scientist in residence, I have learned to judge code quality at a quick glance by looking at indentation. The rule of thumb is: good code has frequent changes in indentation, but should not be deeply indented.
Picture this: You’ve been working hard on a project at work. You’ve run several algorithms, tuned the necessary hyperparameters, performed cross validation and exhausted the checks required to ensure you’re not overfitting.
It’s 2020 and the world has changed remarkably, including in how companies screen data science candidates. While many things have changed, there is one change that stands out above the rest. At The Data Incubator, we run a data science fellowship and are responsible for hundreds of data science hires each year.
The concept of “edge computing” has been around since the late 90s, and typically refers to systems that process data where it is collected instead of having to both store and push it to a centralized location for off-line processing.
© Copyright 2021, The Data Incubator