At The Data Incubator, we strive to provide the most up-to-date data science curriculum available. Using feedback from our corporate and government partners, we deliver training on the most sought-after data science tools and techniques in industry. We wanted to include a more data-driven approach to developing our curriculum.
Results and Discussion
d3.js is at least four standard deviations above the mean on all calculated metrics. d3.js offers users full control of all aspects of their data visualizations. With this power comes a trade-off: d3.js does not come with built-in charts and making a simple bar graph can become quite time consuming. For this reason, dozens of reusable charting packages have been built upon d3.js. D3.js derivatives with premade components make up six of the top 20 packages on our list. These include: plottable (4), plotly.js (5), britecharts (7), c3 (9), recharts (15), and dc.js (18). These derivatives tend to provide charting options for bar, line, and scatter plots. For more specialized visualizations such as maps and networks additional packages are necessary.
leaftlet.js is the most popular map visualization package
leaflet.js (6) is the only package dedicated to mapping to break into the top 20 on our list with scores above the mean on all of our metrics. In addition to specializing in interactive maps, leaflet.js is lightweight (38KB of JS) and mobile-friendly. cesium (27) is the highest ranking package to offer 3D globes and maps. cartodb (29), rickshaw (37), and datamaps (46) also offer powerful geospatial/mapping visualizations.
sigma.js beats cytoscape for the most popular graph/network visualization package
britecharts has the largest growth rate for 2017
With so many data visualization options (we ranked 110), one might think it would be hard for a new charting package to gain a following. britechars, a reusable charting library based on D3.js and created by eventbrite, was first made publicly available less than two years ago. britecharts earned the number 7 spot in our overall rankings, and the highest compound monthly growth rate (110%) over the last 6 months. The next package to even come close is graphael with a 56% growth rate.
There’s a place near the top for both flot and flotr2
Further, naturally, some packages that have been around longer will have higher metrics, and therefore higher ranking. This is not adjusted for in the Stack Overflow or Github metrics. The download metrics are restricted to the past six months.
The data presented a few difficulties:
- The plottable has an inflated Stack Overflow (SO) question metrics since it’s a common word.
- SO data for plotly may also be inflated, as it’s both an R and Python package.
All source code and data is on our Github Page.
We first generated a list of 141 Data Science packages from these four sources, and then collected metrics for all of them, to come up with the ranking. Github data is based on both stars and forks, while Stack Overflow data is based on tags and questions containing the package name. Downloads data is from npmjs. Downloads were totaled over a six month period, and the compound monthly growth rate was calculated over the same period. After scraping other sites for JS visualization package names, we had gathered over 200 package names. Many of them were aliases for the same packages (d3, D3JS). If a the first result of Github search returned the same repo as another package, we treated them as the same package, but saved the aliases to search Stack Overflow questions.
A few other notes:
- Any unavailable Stack Overflow counts were converted to zero count.
- Counts were standardized to mean 0 and deviation 1, and then averaged to get Github, Stack Overflow, and Download scores, and combined to get the Overall score.
- Some manual checks were done to confirm Github repository location.
- 191 D3-modules were removed and d3-modules data collection, analysis, and ranking were done separately.
All data was downloaded on August 6, 2017.