Rapid Data Viz
Andrew Montalenti, CTO
Andrew Montalenti, CTO
Websites have a variety of interesting "first-party" metrics:
Three people:
data-ink ratio, cognitive style, chartjunk
not just charts, data-document joins
multi-disciplanary process, feedback loops, iteration
Paradox of choice?
Tools for everything, but no dataviz REPL.
Or is there? Enter IPython Notebook, Pandas, the web.
Let's take a look at "pulse traffic time series".
CONUNDRUM: Once I have some nice, clean, time series (or other) data rendering nicely in the IPython Notebook, how do I get it rendering nicely in the browser?
var data = [1, 2, 3, 4, 5];
var width = 200;
var height = 200;
var x = d3.scale
.ordinal()
.domain(data)
.rangeBands([0, width]);
var y = d3.scale
.linear()
.domain([0, d3.max(data)])
.range([0, height]);
var pct = d3.scale
.linear()
.domain([0, d3.max(data)])
.range([0.4, 1]);
y(1.7) // -> 68px
pct(1.7) // -> 60.4%
y(4.5) // -> 180px
pct(4.5) // -> 94%
x(5) // -> 160px
x.rangeBand() // -> 40px
var chart = d3.select("#container")
.append("svg")
.attr("class", "chart")
.attr("fill", "steelblue")
.attr("width", width)
.attr("height", height)
.append("svg:g");
chart.selectAll("rect")
.data(data)
.enter()
.append("svg:rect")
.attr("x", x)
.attr("height", y)
.attr("opacity", pct)
.attr("y", function(d, i) { return height - y(d); })
.attr("width", x.rangeBand());
I built a tool called "webrepl" for this.
Need to convert Pandas DataFrame to JSON format of some sort.
Typically: data and labels.
Typically also a pain in the butt!
Assumes a certain data format, typically an array of dictionaries (series)
var data = [
{"key": "data",
"values": [
1, 2, 3, 4, 5
]
}
];
The values array will become your chart series data -- can use your own structure there.
Model is basically a pre-set of d3 scales, axes, labels, and data joins.
nv.addGraph(function() {
// build nvd3 chart model
var chart = nv.models.discreteBarChart()
.x(function(d, i) { return i })
.y(function(d) { return d })
.tooltips(true).showValues(true);
// plain d3 code to do data-document binding
d3.select('#chart svg').datum(data)
.transition().duration(500)
.call(chart);
// nv utility for refreshing graph based on window size
nv.utils.windowResize(chart.update);
return chart;
});
Still supports full power of d3, but gives you a starting point
var spec = {
"width": 200,
"height": 200,
"data": [
{
"name": "table",
"values": [
{"x":"A", "y":1}, {"x":"B", "y":2}, {"x":"C", "y":3},
{"x":"D", "y":4}, {"x":"E", "y":5}
]
}
],
// ...
"scales": [
{"name": "x",
"type": "ordinal",
"range": "width",
"domain": {"data":"table", "field":"data.x"} },
{"name": "y",
"range": "height",
"nice": true,
"domain": {"data": "table", "field": "data.y"} },
{"name": "pct",
"range": [0.4, 1],
"nice": true,
"domain": {"data": "table", "field": "data.y"} }
],
// ...
"marks": [
{
"type": "rect",
"from": {"data": "table"},
"properties": {
"enter": {
"x": {"scale": "x", "field": "data.x"},
"width": {"scale":"x", "band": true, "offset": -1},
"y": {"scale": "y", "field": "data.y"},
"y2": {"scale": "y", "value": 0},
"opacity": {"scale": "pct", "field": "data.y"}
},
"update": {
"fill": {"value": "steelblue"}
}
}
}
]
site_stack = vincent.StackedArea(df)
site_stack.axis_titles(x='Date', y='Pageviews')
site_stack.legend(title='Sites')
site_stack.display()
Step Tools acquire pymongo, solr, apache pig parse python stdlib, custom tools filter ipython notebook, listcomps mine pandas represent matplotlib, vincent, nvd3 refine d3, chrome inspector interact d3
Offline: I use Phantom to run full stack, including d3.
Talk about new IPyNB comm capabilities.
Links:
- parse.ly/jobs
- parse.ly/authority
Contacts:
- @amontalenti / @parsely
Questions? Tweet me!