Our founder, Michael Li, recently collaborated with his colleague Raymond Perkins, a researcher and PhD candidate at Princeton University, on this piece about big data and polling. You can find the original article at Data Driven Journalism.
The recent presidential inauguration and the notably momentous election that preceded it has brought about numerous discussions surrounding the accuracy of polling and big data. The US election results paired with those of Brexit, and the Colombian Referendum have left a number of people scratching their heads in confusion. Statisticians, however understand the multitude of sampling biases and statistical errors than can ensue when your data is involving human beings.
“Though big data has the potential to virtually eliminate statistical error, it unfortunately provides no protection against sampling bias and, as we’ve seen, may even compound the problem. This is not to say big data has no place in modern polling, in fact it may provide alternative means to predict election results. However, as we move forward we must consider the limitations of big data and our overconfidence in it as a polling panacea.”
At The Data Incubator, this central misconception about big data is one of the core lessons we try to impart on our students. Apply to be a Fellow today!
Editor’s Note: The Data Incubator is a data science education company. We offer a free eight-week fellowship helping candidates with PhDs and masters degrees enter data science careers. Companies can hire talented data scientists or enroll employees in our data science corporate training.