Bringing D3.JS to Jupyter Notebook with Py-D3

09/12/2016

D3 is a well-known and -loved JavaScript data visualization and document object manipulation library which makes it possible to express even extremely complex visual ideas simply using an intuitive grammar. Jupyter is a browser-hosted Python executable environment which provides an intuitive data science interface.

These libraries are foundational cornerstones of web-based data visualization and web-based data science, respectively, and most of my projects begin in one and end in another.

However, they don't work together very well.

The trouble is that whilst it's possible to run arbitrary JavaScript code inside of Jupyter Notebook cells (via the %%javascript or %%html cell magics), Jupyter places no restrictions on the elements of the page—the so-called document object model, or DOM for short—the executing code has access to.

Since the Jupyter Notebook interface itself is a part of the page, this makes it extremely easy to accidentally do grievous harm to your environment. For instance, after importing D3 the following one-liner is sufficient to destroy your entire display:

%%javascript d3.selectAll("div").remove();

Even subtle changes will have side-effects on your interface—not really tenable.

This bothered me. I would do all of the data munging I needed for a project in the Notebook, then swap over to an entirely different environment—a JavaScript IDE—to prototype a full-scope visualization. Moving between interfaces like this slows work down and breaks the link between preparation and presentation which is integral to why Jupyter notebooks are so revolutionary in the first place.

A few shims trying to address this problem already exist. I'm personally familiar with two: one a hack put together by folks at the Data Incubator (and it really is just that, a hack), the other a more polished API from StitchFix. Neither is really ideal: the former really is just a hack, while the latter requires some pretty complex template substitutions to work. It shouldn't have to be that hard.

py-d3 builds on these previous efforts by trying to provide a truly idempotent D3 environment. It achieves this by overwriting the default d3.select() and d3.selectAll() selector methods with cell-specific versions, ones that, at runtime, can only see objects created inside of the currently executed cell (an approach suggested to me by Mike Bostock himself).

All you have to do is declare %%d3 at the top of your cell, and then everything shoud just work!

These unlocks a wealth of complex visualizations that would be hard to do in Python alone. A radial Reingold-Tilford tree, for example:

Data.

An interactive treemap:

Data.

Or even the entire D3 show reel animation:

Data.

It's important to remember that D3 is a high complexity, high fidelity approach to the problem. If you just need to get a point across or need to explore a data facet quickly, ipywidgets are an easier approach.

But if you absolutely need full control—here's a new option to consider.

Head over to the GitHub repository to learn more.

— Aleksey