2018 Data Sources for Cool Data Science Projects, provided by Thinknum

Links to our previous “Data Sources for Cool Data Science Projects” posts:
Part 1Part 2Part 3Part 4, Part 5

At The Data Incubator, we run a Data Science Fellowship program for Master’s and PhD graduates looking to transition to a career in industry. Our admissions team, as well as our hiring partners, love Fellows who don’t mind getting their hands dirty with data. That’s why our applicants submit ideas for capstone projects they’ll work on throughout Fellowship to showcase their data science skills. One of the biggest obstacles to creating and completing successful projects has been getting access to interesting data.

Today, we’re excited to announce a partnership with leading alternative webcrawled data provider, Thinknum. Thinknum has been the principal provider of web crawled data to the finance community for over 3 years, counting more than 150 elite hedge funds and a majority of investment banks in their client list, employing the data to experiment with ever-more innovative and differentiated ways of producing investment ideas across all sectors and multiple asset classes. More recently, Thinknum’s data has been in high demand for the some of the largest and most innovative corporate customers for internal strategic decision making. The data is also heavily used by journalists, especially those reporting on the financial sector, with the media outlets like CNN, Business Insider and CNBC all using Thinknum resources in their stories. This partnership will provide Fellows and Fellowship applicants access to some of the data used by experts in the finance industry and corporate leaders on a daily basis.

Business, economic and social activity is continually moving online. This increasing digital activity leaves behind data trails that, with proper organization, can reveal otherwise invisible trends, shifts and movements. Thinknum clients, and now The Data Incubator Fellows and applicants can utilize this data for the purposes of investing, gaining deeper understanding of businesses, or telling a story about an industry trend. Thinknum trawls the internet to collect data on over 400,000 public and private companies across the globe every day, generating huge amounts of data. Their intuitive web-based tool will allow fellows to easily navigate huge volumes of data to gather insights, create correlations, and generate visualisations to share with other fellows in seconds.

Thinknum Data

Thinknum tracks thousands of websites capturing and indexing vast amounts of public data, indexes it and maps it back to individual companies. In the full Thinknum library there are over 20 datasets, each containing dozens of metrics updated daily.

3 Datasets

Thinknum is providing The Data Incubator with access to three real world datasets for our fellows to analyze and explore. In terms of potential projects, there are virtually limitless options for each dataset and most of them haven’t been worked through. If you take a look at the number of columns for each, you will get a sense just how many questions one can ask. Included are a few initial suggestions though.

Job Postings:

This database tracks individual job postings on corporate websites, allowing researchers and data scientists to view overall hiring plans of a company overtime. As well as historical data, users explore in a great detail what types of positions a company is looking to fill, where a company is looking to grow geographically, and in what specific product/business lines the company is looking to expand the most.

Using this database, Thinknum Media journalists were able to show that the number of job listings at Apples new headquarters containing the word “Siri” had spiked in the recent weeks. They also saw that almost all of the 161 jobs related to Siri, 154 were in software engineering. From their findings, Thinknum journalists were able predict Apple’s efforts to concentrate on Siri development an entire week before the plans were officially announced by Apple.

Project suggestions:

    • In which geographies are tech companies hiring the most engineers, blockchain developers, etc?
    • Using job openings data, explore how banks are shifting their strategy to heavier reliance on technology/heavier regulatory burden.


Linkedin Profiles:

This database tracks and records the number of employees across companies on daily basis and provides real time insight into how aggressively a company is growing vs its own plans and within its industry.

Here, Thinknum Media looked at the LinkedIn profile data for Vox and Buzzfeed employees, as well as job listings data. The journalists also looked at company survey data from Glassdoor, and found that the numbers of Vox employees who had a positive outlook on the future of their company had fallen almost 20%. By combining all these datasets, they found that the number of open job listings was falling, the number of people reporting to be employees of the companies had fallen, and coupled with the findings from the Glassdoor surveys – showed a picture of slowing company growth for both Vox and Buzzfeed.

Project suggestions:

    • Which companies have delivered on their strategic expansion plans (filled the most job openings that showed up on Linkedin)?
    • Find companies where hiring is most predictive of stock prices.


Facebook Followers:

Social media platforms like Facebook provide a myriad of data points about companies such as customer traction, foot traffic, and brand awareness among others.

By analyzing Facebook ‘check in’ data, Investment Bank Cowen used this data to track foot traffic to Chipotle starting in 2017, and thus predict falls in Chipotle stock performance as well. This metric of analyzing footfall became a staple for fast food restaurant research analysts as discussed by Yahoo Finance article.

Project suggestions:

    • Compare companies with highest volatility of “talking about count” — who they are – and use any information online to see if this metric overlaps with highly publicized events and marketing campaigns.
    • Facebook check-ins as a metric for foot traffic for restaurant, hospitality and retail businesses. Who are the winners in attracting customers to physical locations.
    • Facebook followers and which companies are the most successful at growing social media traction


While building your own project cannot replicate the experience of fellowship at The Data Incubator (our Fellows get amazing access to hiring managers and access to nonpublic data sources) we hope this will get you excited about working in data science. And when you are ready, you can apply to be a Fellow!

Got any more data sources? Let us know and we’ll add them to the list!

Learn more about our offerings:

Related Blog Posts

Moving From Mechanical Engineering to Data Science

Moving From Mechanical Engineering to Data Science

Mechanical engineering and data science may appear vastly different on the surface. Mechanical engineers create physical machines, while data scientists deal with abstract concepts like algorithms and machine learning. Nonetheless, transitioning from mechanical engineering to data science is a feasible path, as explained in this blog.

Read More »
Data Engineering Project

What Does a Data Engineering Project Look Like?

It’s time to talk about the different data engineering projects you might work on as you enter the exciting world of data. You can add these projects to your portfolio and show the best ones to future employers. Remember, the world’s most successful engineers all started where you are now.

Read More »
open ai

AI Prompt Examples for Data Scientists to Use in 2023

Artificial intelligence (AI) isn’t going to steal your data scientist job! Instead, AI tools like ChatGPT can automate some of the more mundane tasks in your future career, saving you time and energy. To make life easier, here are some data science prompts to get you started.

Read More »