Christian Moscardi is the Director of Technology at The Data Incubator. This was originally posted on his blog.
Jupyter is a fantastic tool that we use at The Data Incubator for instructional purposes. In particular, we like to keep our curriculum compartmentalized via Jupyter notebooks. It allows us to test our code samples across any language there’s a Jupyter kernel for* and keep things in one place, so our Fellows don’t have to rifle through a wide variety of file formats before getting to the information they need.
One area where we only recently integrated Jupyter was frontend web visualization. Our previous structure involved a notebook, possibly with code snippets, that contained links to various HTML files. We expected our Fellows to dig through the code to
- Look at the HTML source for the basic layout.
- View the styles making everything pretty.
Oh, and any data processing code was separate/output to a file. Obviously not ideal. We knew IPython had
Conveniently, Jupyter already uses
require.js, and it works great! Thanks to this blogpost (explaining a slightly more cumbersome way to embed D3) for the tipoff.
The Big Discovery:
element, to refer to the output cell. This is obviously very convenient – we now have a way to create arbitrary DOM elements in our cell. In particular, we can create SVG canvases and add SVG shapes to that canvas… see where this is going?
This means we can write code like this:
and we’ve created a nice div in our output!
Data Conversion: Pandas Dataframe → JSON
This one’s a bit of a hack. Basically, since the
window is set. So we do what every JS developer has (maybe shamefully) done at some point in their career, and bind data to
window so that it’s globally accessible.
But wait, it gets better: Pandas dataframe objects have a
to_json function! The only trick now is managing to execute some JS code that loads the JSON dump we can get for free from Pandas. Here’s a snippet that does just that, invoking some of IPython’s backend display logic:
As it turns out, Pandas dumps its dataframes in a way that isn’t exactly what D3 is looking for by default. You may want to restructure your data in a certain way for D3 – you now have the freedom to do that in JS or Python – as long as you can call
json.dumps on it in Python, you can bind it to D3. You could also call Pandas’
to_csv, bind that string to the client side, and load it into D3 using the
d3.csv convenience function.
With all this taken care of, there isn’t really much else to do! We now have interactive visualizations! After this, it’s up to you to write whatever you want in D3. This sort of embedding might be useful if you want to pass around analysis and visualization source code all-in-one, so that a collaborator can immediately reproduce a given result and help tweak your visualizations. If you’re giving a talk, it’s very useful for instructive purposes!
Here’s a sample, very basic, D3 visualization. You can see the IPython notebook here.
Unfortunately, github doesn’t render JS (how cool would that be…), so you can clone the repo to play with it yourself. But that notebook gives the gist of it.
Other Useful Tools
mpld3 – cross-compiling matplotlib into D3.js code, plays nice with IPython notebooks.
*if you’re interested in Jupyter notebook testing, let me know – seems like good fodder for another post!**
Editor’s Note: The Data Incubator is a data science education company. We offer a free eight-week Fellowship helping candidates with PhDs and masters degrees enter data science careers. Companies can hire talented data scientists or enroll employees in our data science corporate training.