Rapid Data Viz

Andrew Montalenti, CTO

What do we do?


Parse.ly customers


Is online media special?

Websites have a variety of interesting "first-party" metrics:

Third-party metrics emerging


What about online journalism?


Time series data

_images/sparklines_multiple.png _images/sparklines_stacked.png

Summary breakdowns


Benchmark statistics


Information radiators


Contextual overlays


How do we do it?


Parse.ly careers



Data Visualization Theory

Three people:

Edward Tufte


Tufte: Do Whatever It Takes


data-ink ratio, cognitive style, chartjunk

Bostock: Embrace Standards


not just charts, data-document joins

Fry: It's a Process

_images/process_01.png _images/process_02.png

multi-disciplanary process, feedback loops, iteration

Chart Types (1)

_images/elements_01.png _images/elements_05.png _images/elements_06.png

Chart Types (2)

Paradox of choice?

_images/elements_02.png _images/elements_03.png _images/elements_04.png

Encoding Guide (1)


Encoding Guide (2)


Dense Displays


How to iterate?


Tools for everything, but no dataviz REPL.

Or is there? Enter IPython Notebook, Pandas, the web.


Let's take a look at "pulse traffic time series".



Data my browser!

CONUNDRUM: Once I have some nice, clean, time series (or other) data rendering nicely in the IPython Notebook, how do I get it rendering nicely in the browser?


d3-oriented Approach






Data-Driven Documents


d3 scales

var data = [1, 2, 3, 4, 5];

var width = 200;
var height = 200;

var x = d3.scale
            .rangeBands([0, width]);
var y = d3.scale
            .domain([0, d3.max(data)])
            .range([0, height]);
var pct = d3.scale
            .domain([0, d3.max(data)])
            .range([0.4, 1]);

d3 scaling

y(1.7) // -> 68px
pct(1.7) // -> 60.4%
y(4.5) // -> 180px
pct(4.5) // -> 94%
x(5) // -> 160px
x.rangeBand() // -> 40px

d3 drawing

var chart = d3.select("#container")
    .attr("class", "chart")
    .attr("fill", "steelblue")
    .attr("width", width)
    .attr("height", height)

            .attr("x", x)
            .attr("height", y)
            .attr("opacity", pct)
            .attr("y", function(d, i) { return height - y(d); })
            .attr("width", x.rangeBand());

Prototyping with d3

I built a tool called "webrepl" for this.

What about my data?

Need to convert Pandas DataFrame to JSON format of some sort.

Typically: data and labels.

Typically also a pain in the butt!

nvd3 add-on

nvd3 concepts

nvd3 graphs


nvd3 approach

Assumes a certain data format, typically an array of dictionaries (series)

var data = [
    {"key": "data",
     "values": [
        1, 2, 3, 4, 5

The values array will become your chart series data -- can use your own structure there.

Model is basically a pre-set of d3 scales, axes, labels, and data joins.

nvd3 model

nv.addGraph(function() {
    // build nvd3 chart model
    var chart = nv.models.discreteBarChart()
        .x(function(d, i) { return i })
        .y(function(d) { return d })

    // plain d3 code to do data-document binding
    d3.select('#chart svg').datum(data)

    // nv utility for refreshing graph based on window size

    return chart;

nvd3 benefit

Still supports full power of d3, but gives you a starting point


What is Vega?


Vega bar example (1)

var spec = {
    "width": 200,
    "height": 200,
    "data": [
            "name": "table",
            "values": [
                {"x":"A", "y":1}, {"x":"B", "y":2}, {"x":"C", "y":3},
                {"x":"D", "y":4}, {"x":"E", "y":5}
    // ...

Vega bar example (2)

"scales": [
    {"name": "x",
     "type": "ordinal",
     "range": "width",
     "domain": {"data":"table", "field":"data.x"} },
    {"name": "y",
     "range": "height",
     "nice": true,
     "domain": {"data": "table", "field": "data.y"} },
    {"name": "pct",
     "range": [0.4, 1],
     "nice": true,
     "domain": {"data": "table", "field": "data.y"} }
// ...

Vega bar example (3)

"marks": [
        "type": "rect",
        "from": {"data": "table"},
        "properties": {
            "enter": {
                "x": {"scale": "x", "field": "data.x"},
                "width": {"scale":"x", "band": true, "offset": -1},
                "y": {"scale": "y", "field": "data.y"},
                "y2": {"scale": "y", "value": 0},
                "opacity": {"scale": "pct", "field": "data.y"}
            "update": {
                "fill": {"value": "steelblue"}

How does Vega work?

What is Vincent?

Vincent Graphs



vincent example

site_stack = vincent.StackedArea(df)
site_stack.axis_titles(x='Date', y='Pageviews')

My Tools

Step Tools
acquire pymongo, solr, apache pig
parse python stdlib, custom tools
filter ipython notebook, listcomps
mine pandas
represent matplotlib, vincent, nvd3
refine d3, chrome inspector
interact d3

Offline: I use Phantom to run full stack, including d3.

Why is IPyNB so exciting?

New IPyNB dataviz utilities

Future Nirvana

My Use Cases

Authority Report


Extra Time?

Talk about new IPyNB comm capabilities.

Type Into Browser


  • parse.ly/jobs
  • parse.ly/authority


  • @amontalenti / @parsely

Questions? Tweet me!

This deck

Other resources