==============
Rapid Data Viz
==============
Andrew Montalenti, CTO
.. rst-class:: logo
.. image:: ./_static/parsely.png
:width: 40%
:align: right
What do we do?
==============
.. image:: ./_static/parsely.png
:width: 90%
:align: center
Parse.ly customers
==================
.. figure:: /_static/logos.png
:width: 90%
:align: center
Is online media special?
========================
Websites have a variety of interesting "first-party" metrics:
* pageviews
* unique visitors
* sessions and paths
* time spent
* page engagement (scroll, copy/paste)
* referrers
* search keywords
Third-party metrics emerging
============================
* **Comments**: Disqus, LiveFyre, Wordpress
* **Shares**: Twitter, Google+, LinkedIn, Facebook
* **Pins and Saves**: Pinterest, Delicious
* **Upvotes and Likes**: Reddit, Digg
* **Queues**: Instapaper, Readability
.. image:: ./_static/social_icons.png
:width: 60%
:align: center
What about online journalism?
=============================
* **Short Shelf Life**: average content shelf-life <48 hours
* **High Frequency Publishing**: 1000's posts per day
* **Unclear Conversion Goals**: nothing to buy
.. image:: ./_static/pulse.png
:width: 60%
:align: center
Time series data
================
.. image:: ./_static/sparklines_multiple.png
:align: center
.. image:: ./_static/sparklines_stacked.png
:align: center
Summary breakdowns
==================
.. rst-class:: spaced
.. image:: ./_static/summary_viz.png
:align: center
Benchmark statistics
====================
.. rst-class:: spaced
.. image:: ./_static/benchmarked_viz.png
:align: center
Information radiators
=====================
.. rst-class:: spaced
.. image:: ./_static/glimpse.png
:width: 100%
:align: center
Contextual overlays
===================
.. rst-class:: spaced
.. image:: ./_static/extension.png
:width: 100%
:align: center
How do we do it?
================
.. image:: ./_static/oss_logos.png
:width: 90%
:align: center
Parse.ly careers
================
.. figure:: /_static/team_jobs.png
:width: 70%
:align: center
Agenda
======
* Data Visualization Theory
* **webrepl**: d3 for browser dataviz
* **pyrepl**: Pandas for data mining
* **vizrepl**: IPython Notebook 2.0-dev
Data Visualization Theory
=========================
Three people:
* Edward Tufte
* Mike Bostock
* Benjamin Fry
Edward Tufte
============
.. rst-class:: spaced
.. image:: ./_static/et_dash.jpg
:width: 80%
:align: center
Tufte: Do Whatever It Takes
===========================
.. rst-class:: spaced
.. image:: ./_static/minard.png
:width: 100%
:align: center
data-ink ratio, cognitive style, chartjunk
Bostock: Embrace Standards
==========================
.. rst-class:: spaced
.. image:: ./_static/data_join.png
:width: 70%
:align: center
not just charts, data-document joins
Fry: It's a Process
===================
.. rst-class:: spaced
.. image:: ./_static/process_01.png
:width: 100%
:align: center
.. image:: ./_static/process_02.png
:width: 100%
:align: center
multi-disciplanary process, feedback loops, iteration
Chart Types (1)
===============
.. rst-class:: spaced
.. image:: ./_static/elements_01.png
.. image:: ./_static/elements_05.png
.. image:: ./_static/elements_06.png
Chart Types (2)
===============
Paradox of choice?
.. rst-class:: spaced
.. image:: ./_static/elements_02.png
.. image:: ./_static/elements_03.png
.. image:: ./_static/elements_04.png
Encoding Guide (1)
==================
.. rst-class:: spaced
.. image:: ./_static/viz_elements.png
:width: 80%
:align: center
Encoding Guide (2)
==================
.. rst-class:: spaced
.. image:: ./_static/elements_table.png
:width: 80%
:align: center
Dense Displays
==============
.. rst-class:: spaced
.. image:: ./_static/more_data.png
:width: 80%
:align: center
How to iterate?
===============
.. image:: ./_static/process_03.png
:width: 100%
:align: center
Tools for everything, but no **dataviz REPL**.
Or is there? Enter IPython Notebook, Pandas, the web.
pyrepl
======
Let's take a look at "pulse traffic time series".
.. image:: ./_static/pulse.png
:width: 60%
:align: center
pandas
======
* dataframes
* loading
* aggregates
* grouping
* sorting
* serializing
* matplotlib
* but, dataviz isn't "product ready"!
Data my browser!
================
CONUNDRUM: Once I have some nice, clean, time series (or other) data rendering
nicely in the IPython Notebook, how do I get it rendering nicely in the
browser?
Options
=======
* d3 bespoke viz: hardest, most flexible
* nvd3 chart models: slightly easier, still very flexible
* vincent/vega: easiest, relatively inflexible
* (these aren't only options, but IMO best ones)
d3-oriented Approach
====================
* iterate with Pandas and matplotlib
* convert dataframe to JSON
* load JSON with d3
* use d3 for final cleaning
* build scales / axes / labels from scratch
* build interaction layer from scratch
* for offline, use PhantomJS render
d3
==
* selections
* svg
* scales
* axes
* joins
Data
====
.. rst-class:: spaced
.. image:: ./_static/data_set.png
:width: 80%
:align: center
Documents
=========
.. rst-class:: spaced
.. image:: ./_static/data_values.png
:width: 80%
:align: center
Data-Driven Documents
=====================
.. rst-class:: spaced
.. image:: ./_static/data_highlights.png
:width: 80%
:align: center
d3 scales
=========
.. sourcecode:: javascript
var data = [1, 2, 3, 4, 5];
var width = 200;
var height = 200;
var x = d3.scale
.ordinal()
.domain(data)
.rangeBands([0, width]);
var y = d3.scale
.linear()
.domain([0, d3.max(data)])
.range([0, height]);
var pct = d3.scale
.linear()
.domain([0, d3.max(data)])
.range([0.4, 1]);
d3 scaling
==========
.. sourcecode:: javascript
y(1.7) // -> 68px
pct(1.7) // -> 60.4%
y(4.5) // -> 180px
pct(4.5) // -> 94%
x(5) // -> 160px
x.rangeBand() // -> 40px
d3 drawing
==========
.. sourcecode:: javascript
var chart = d3.select("#container")
.append("svg")
.attr("class", "chart")
.attr("fill", "steelblue")
.attr("width", width)
.attr("height", height)
.append("svg:g");
chart.selectAll("rect")
.data(data)
.enter()
.append("svg:rect")
.attr("x", x)
.attr("height", y)
.attr("opacity", pct)
.attr("y", function(d, i) { return height - y(d); })
.attr("width", x.rangeBand());
Prototyping with d3
===================
I built a tool called "webrepl" for this.
* HTML page with codemirror + emmet
* shortcut that installs jquery, bootstrap, d3 on page
* renders JavaScript code into preview iframe
* Browser inspector lets me look into that frame
What about my data?
===================
Need to convert Pandas DataFrame to JSON format of some sort.
Typically: data and labels.
Typically also a pain in the butt!
nvd3 add-on
===========
* use canned nvd3 chart type
* customize interaction layer atop
nvd3 concepts
=============
* models
* charts
* tooltips
* utilities
nvd3 graphs
===========
.. figure:: /_static/nvd3_graphs.png
:width: 90%
:align: center
nvd3 approach
=============
Assumes a certain data format, typically an array of dictionaries (series)
.. sourcecode:: javascript
var data = [
{"key": "data",
"values": [
1, 2, 3, 4, 5
]
}
];
The ``values`` array will become your chart series data -- can use your own
structure there.
Model is basically a pre-set of d3 scales, axes, labels, and data joins.
nvd3 model
==========
.. sourcecode:: javascript
nv.addGraph(function() {
// build nvd3 chart model
var chart = nv.models.discreteBarChart()
.x(function(d, i) { return i })
.y(function(d) { return d })
.tooltips(true).showValues(true);
// plain d3 code to do data-document binding
d3.select('#chart svg').datum(data)
.transition().duration(500)
.call(chart);
// nv utility for refreshing graph based on window size
nv.utils.windowResize(chart.update);
return chart;
});
nvd3 benefit
============
Still supports full power of d3, but gives you a starting point
.. figure:: /_static/nvd3_bar.png
:width: 90%
:align: center
What is Vega?
=============
* Vega is a **declarative** abstraction for dataviz.
* Essentially, a domain-specific language written in JSON.
* Outputs to d3 and also HTML5 Canvas.
.. figure:: /_static/vega_website.png
:width: 60%
:align: center
Vega bar example (1)
====================
.. sourcecode:: javascript
var spec = {
"width": 200,
"height": 200,
"data": [
{
"name": "table",
"values": [
{"x":"A", "y":1}, {"x":"B", "y":2}, {"x":"C", "y":3},
{"x":"D", "y":4}, {"x":"E", "y":5}
]
}
],
// ...
Vega bar example (2)
====================
.. sourcecode:: javascript
"scales": [
{"name": "x",
"type": "ordinal",
"range": "width",
"domain": {"data":"table", "field":"data.x"} },
{"name": "y",
"range": "height",
"nice": true,
"domain": {"data": "table", "field": "data.y"} },
{"name": "pct",
"range": [0.4, 1],
"nice": true,
"domain": {"data": "table", "field": "data.y"} }
],
// ...
Vega bar example (3)
====================
.. sourcecode:: javascript
"marks": [
{
"type": "rect",
"from": {"data": "table"},
"properties": {
"enter": {
"x": {"scale": "x", "field": "data.x"},
"width": {"scale":"x", "band": true, "offset": -1},
"y": {"scale": "y", "field": "data.y"},
"y2": {"scale": "y", "value": 0},
"opacity": {"scale": "pct", "field": "data.y"}
},
"update": {
"fill": {"value": "steelblue"}
}
}
}
]
How does Vega work?
===================
* vega runtime generates d3 instructions
* for offline mode, use vg2png/vg2svg
What is Vincent?
================
* vincent is a Python library that "humanizes" vega.
* use vincent inside IPyNB
* export vega JSON from vincent objects
* run vega JS library to parse JSON
Vincent Graphs
==============
.. figure:: /_static/vincent_ipynb.png
:width: 100%
:align: center
vincent
=======
* vega (JSON)
* declarative visualizations
* HTML canvas
vincent example
===============
.. sourcecode:: python
site_stack = vincent.StackedArea(df)
site_stack.axis_titles(x='Date', y='Pageviews')
site_stack.legend(title='Sites')
site_stack.display()
.. figure:: /_static/vincent_stacked.png
My Tools
========
=========== ===================================
Step Tools
=========== ===================================
acquire pymongo, solr, apache pig
parse python stdlib, custom tools
filter ipython notebook, listcomps
mine pandas
represent matplotlib, vincent, nvd3
refine d3, chrome inspector
interact d3
=========== ===================================
Offline: I use Phantom to run full stack, including d3.
Why is IPyNB so exciting?
=========================
* execution
* display
* saving / sharing
* platform unification
New IPyNB dataviz utilities
===========================
* IPython cell magics (``%%html``, ``%%javascript``)
* display framework
* ipython locate profile for custom CSS/JS
Future Nirvana
==============
* edit data with Pandas in IPyNB
* snapshot data as JSON cell
* edit d3 / nvd3 code in ``%%javascript`` cell
* use ``IPython.display`` to show d3 rendering result
* vincent example leads the way here
My Use Cases
============
* mine network referrers for trends
* compare real-time traffic between publishers
Authority Report
================
.. rst-class:: spaced
.. image:: ./_static/authority_report.png
:width: 80%
:align: center
Extra Time?
===========
Talk about new IPyNB comm capabilities.
* Widget framework?
* Python-to-JavaScript bridge via ``IPython.kernel.comm``?
* IPython JavaScript API for cell reading?
Type Into Browser
=================
.. rst-class:: bigger
**Links:**
- parse.ly/jobs
- parse.ly/authority
**Contacts:**
- @amontalenti / @parsely
Questions? `Tweet me`_!
.. _Tweet me: http://twitter.com/amontalenti
This deck
=========
* `slides`_
* `notes`_
* `code`_
.. _slides: http://pixelmonkey.org/pub/dataviz-elements
.. _notes: http://pixelmonkey.org/pub/dataviz-elements/notes
.. _code: http://bit.ly/dataviz-elements-code
Other resources
===============
* `d3.js library`_
* `nvd3 library`_
* `nvd3 live code examples`_
* `trifacta's vega library`_
* `vega live editor`_
* `vincent library`_
* `tributary simple bars`_
* `codemirror`_
* `emmet`_
.. _d3.js library: http://d3js.org
.. _nvd3 library: http://nvd3.org/
.. _nvd3 live code examples: http://nvd3.org/livecode/
.. _trifacta's vega library: https://github.com/trifacta/vega
.. _vega live editor: http://trifacta.github.io/vega/editor/
.. _vincent library: https://github.com/wrobstory/vincent
.. _tributary simple bars: http://tributary.io/inlet/7376344
.. _codemirror: http://codemirror.net/
.. _emmet: http://emmet.io/
.. ifnotslides::
.. raw:: html
.. ifslides::
.. raw:: html