Responsive graphs and custom elements

A few weeks ago I published a statistical analysis of one-day cricket – specifically a verification of the old “double the 30-over score” trope.

This blog post is not about cricket – you can read the link above for that. This is a post about the technology behind that analysis. Ben Hosken from Flink Labs says around 60% of a data visualisation project is just collecting and cleaning the data – and so it was with this analysis. What was originally a “quick distraction” became my coding-on-the-train hobby project for 3 weeks. Here’s how I built it.

Data

All data came from Cricinfo, thanks to their ball-by-ball statistics. I won’t go into much detail about how I got the data because I have don’t know what their data usage guidelines are (I looked). Suffice it to say that I ended up with over 7000 JSON files on my machine.

From these I wrote a series of Python scripts to parse, verify, clean up, analyse and collate the data for each innings. The final output was a single JSON file containing only the relevant data I needed.

Custom element

In order to fit the graphs neatly into the context of a written article, I wanted to be able to embed them as part of the writing process. The article (source on GitHub) was written using Markdown with some custom additions (which are described later). Markdown was used despite my personal annoyances with some aspects of it, mainly because it’s widely supported.

The easiest way to put a custom graph into the article was to plonk some HTML in the Markdown source and let it be copied across to the rendered output – but what HTML? I needed containers for SVG graphs that would be rendered later via JavaScript. Given that there would be a few graphs, the containers needed to define customisations per graph, preferably as part of the article source (for better context when writing).

The usual choice for something like this would be a plain old <div> with lots of data- attributes, one for each option. Unfortunately this didn’t let me easily tweak the options after rendering, which I consider important for data explorations so that you can get the best possible view of the data.

Given the one-off nature of the article, I had the opportunity to use “new shiny” tech in the form of a custom element, part of the new Web Components specification. Obviously proper custom elements aren’t widely supported across browsers yet, so I used the excellent Skate library as a helper (disclaimer: I work on the same team as Skate’s author).

HTML API

Using a custom element meant that my in-context API was simple HTML attributes. I quickly wrote a potential API in a comment before I’d written any code for it.

/**
 * <odi-graph> custom element.
 * ___________________________
 *
 * Basic usage:
 *
 *   <odi-graph>Text description as a fallback</odi-graph>
 *
 * Attributes (all are optional):
 *
 *   <odi-graph graph-title="This is a graph"></odi-graph>
 *     Add a title to the graph
 *
 *   <odi-graph rolling-average="true"></odi-graph>
 *     Include a rolling average line (off by default)
 *
 * [... etc.]
 */

This allowed me to write the article with in-context HTML like so:

In order to find the answer, I graphed out a 100-innings rolling average, to give a better indication of trends over time.

<odi-graph graph-title="Overall vs rolling average"
    rolling-average="true" innings-points="false">
    IMAGE: A graph showing a rolling average halfway mark as described in the next paragraph.
</odi-graph>

Thanks to Skate, the initial definition of the custom element and its attributes was very simple.

skate('odi-graph', {
    attached: function (elem) {
        // Create internal DOM structure
        // [... SNIP: nothing but basic DOM element manipulation here ...]

        // Create the graph
        elem.graph = new ODIGraph();
        queue(elem, function () {
            elem.graph.init(cloneData(stats.data), configMapper(elem));
            onceVisible(elem, function () {
                elem.graph.render(elem);
            });
        });
    },
    attributes: {
        'graph-title': attrSetter,
        'rolling-average': attrSetter,
        'innings-points': attrSetter,
        'ybounds': attrSetter,
        'date-start': attrSetter,
        'date-end': attrSetter,
        'filter': attrSetter,
        'highlight': attrSetter,
        'reset-highlight-averages': attrSetter
    }
});

function attrSetter(elem, data) {
    if (data.newValue !== data.oldValue) {
        if (elem.graph && elem.graph.inited) {
            elem.graph.config(configMapper(elem));
        }
    }
}

The queue and onceVisible calls were just making sure that the graph wasn’t rendered too early, before the data finished loading. The key part is that it just calls out to a separate ODIGraph instance (described later on in this post) for separation of responsibilities. The attrSetter method made sure that the graph was re-rendered if any HTML attribute changed, after running the attributes through configMapper which just parsed the attribute strings into a JS config object.

This gave me the enormous benefit of being able to just tweak attributes for a graph in browser devtools and see the graph instantly update.

Accessibility

As shown above, all graphs were written with some text inside the custom element. During the initialisation of the custom element, a <figure> element is created inside it, then the custom element’s original text context is moved to a hidden <figcaption>. This provides a description that is read out by screenreaders for those users who can’t see the graph visuals.

An additional aspect of providing descriptive captions was making sure that the article text immediately after the graph described an interpretation of the graph data. Doing this ensured that no information was presented only via visuals.

For the finishing touches I added the aria-hidden="true" attribute to the axis and legend text in the graph. Since the graph was built with SVG, all the text elements would be read by screenreaders but without context. (After all, what does “15 20 25 30” mean without the visual positioning context of being on an axis?)

Responsive graph

For the actual graphs, I wanted them to be interactive, mobile-friendly, and (hopefully) not much work.

C3

I started off by using the C3 library for ease of development. It initially ticked a lot of boxes:

I could quickly get up and running with my existing data set.
It was responsive and auto-expanded to whatever size container it was put into.
Auto-scaling and highlighting of data points (with a vertical hover marker) came for free.

I was able to look at my data set in a few different ways with just some simple config tweaks. However, as I dug deeper into what I wanted the graphs to do, I found myself fighting against the library instead of working with it.

As an example, I wanted to combine a scatter plot for individual innings with one or more line plots for the averages. While this was possible with C3, it started to become an either/or situation. C3 gave me a vertical marker line when hovering anywhere on the graph (which is what I wanted), but only when I had just line plots – as soon as the scatter plot was added, the nice hover behaviour disappeared. The way the hover marker was implemented also started to fall down – it worked fine with around 100 data points, but as soon as the full data set of over 1800 points was added, the hovering stopped working.

Added to this was a performance problem of C3 with the size of my data set. In order to hide data point markers on the line plots, C3 adds an SVG DOM node per point then hides them via setting their opacity to 0. It does this so that they can be faded in with an animation (if they’re set to visible via an API call). In my case, this meant over 3500 DOM nodes were being created just to be hidden.

In the end I decided that while C3 is a nice charting library, it wasn’t set up for my use case. I wrote out a list of all the things I could think I would possibly want the graphs to do:

Combine scatter and line plots on the same graph.
Vertical marker to highlight data points when hovering, auto-snapping the line to the nearest data point to the mouse cursor.
Fully custom rendering of a tooltip when highlighting a data point.
Resize when the graph’s container size changes.
Plot data along the X axis by dates (C3 has an API for this, but it didn’t work with my data).
Define a title
Optionally add background highlight regions for grouping data points, with each region having its own title.
Show a horizontal background line marking 30 overs as a reference.
Have good performance – create only as many DOM nodes as necessary for displaying the data.

After quickly looking at some other charting libraries, I bit the bullet and accepted that I was going to have to roll my own.

Custom rendering

Note: All the code for the graphs can be found on GitHub. It changed structure many times as extra requirements turned up, so it’s far from perfect. I didn’t see the need for lots of refactoring, given its one-time-use purpose.

I won’t go into a super-detailed explanation of the code, as most of it is just calling D3 APIs. Instead I’ll detail the high-level concepts.

Most of the contents can be broken down into three main areas of responsibility: setup, sizing, and display.

First is setup. This is only called once per graph on initialisation. The only thing setup code is concerned with is setting up D3 axis components, creating placeholder DOM nodes and adding event listeners.

Second is sizing. This takes DOM nodes created in the setup phase and sets their positioning and dimensions based on the current size of the graph container. The ranges of D3’s axis and scale components are also defined here.

Finally there’s display. This is where the main rendering takes place. The data array that’s passed in is rendered according to a few constraints:

The options already set – filtering, highlight regions, etc.
The dimensions and positioning defined in the sizing phase.

This kind of structure means that when a graph is first created, it calls the setup, sizing and display phases in succession. From then on, resizing the browser window calls straight into sizing and display, meaning the graphs will always fit into the current window (after a short delay due to debouncing). Finally, if graph.data(someNewData) is called, only the display phase is called because the sizing hasn’t changed.

Tweaks for better responsiveness

Although the main data rendering was now responsive to any window size, I realised that some of the text elements didn’t work at small screen sizes. The different types of text required different solutions.

Graph titles could be highly variable in width, as the text is completely different per graph. I ended up with a solution that allowed me to specify different titles in the one attribute – a “primary” title and a shorter variant to be used only if the primary one didn’t fit.

I picked a couple of arbitrary unicode characters that were not going to appear anywhere else, and could be easily typed on my Mac keyboard. One character (· or Option+Shift+9) was used to define a cut-off point in a single title – if the full title didn’t fit in the graph, then only the text up to the cut-off point would be used. The other character (¬ or Option+l (lowercase L)) marked two completely different titles.

Examples from the code comments give a better idea of how they were used:

/**
 * Formatting for titles
 *
 *   `graph-title` attribute and highlight region names can indicate short vs long content.
 *   The short content will be used if the long version can't fit in the space available.
 *   « = short version; » = long version
 *
 *   "This is a title"
 *     « This is a title
 *     » This is a title
 *
 *   "Short title· with some extra"
 *     « Short title
 *     » Short title with some extra
 *
 *   "Short title¬Completely different title"
 *     « Short title
 *     » Completely different title
 */

The other text that didn’t work at small sizes was the legend at the bottom of the graphs. Intially I had added the legend as part of the same SVG element as the graph to keep everything together. I played with some different ways of trying to make the parts of the legend wrap nicely at smaller sizes, but this was one area where SVG worked against me. In the end I realised I was trying to replicate with SVG groups and transforms what CSS gave me for free. I ended up creating one SVG element per legend type, setting display: inline-block on them and letting CSS handle the rest.

Here is a comparison of how the title and legend worked at different graph sizes.

Final touches

With the core functionality done and the graphs finally doing what I wanted, it was time to add in some polish.

Only render when visible

There ended up being 7 graphs in the article. Rendering them all on page load would be a noticeable drag on performance, with over 5000 DOM nodes being inserted into the page.

Instead I used the Verge library to detect when a graph’s custom element became visible for the first time (using a debounced scroll event listener). Once the element becomes visible, the graph setup/sizing/render methods are called for the first time. This makes the initial page load snappy, and also stops the browser from having to render any graph that is never viewed.

Customised Markdown

The article was written using basic Markdown syntax, but with a couple of custom additions. One addition was for adding numeric callouts that appear on the side (using a syntax of ++1234++). The other addition was a helper for easily adding links to specific matches on Cricinfo. The Markdown source was read by a quick-and-dirty Python script, run through a Markdown renderer, then run through extra string replacements for the custom additions. Finally the HTML output was inserted into a basic page template to produce the final article.

So that’s how it was made. Enjoy.

Shoehorn With Teeth

’Cos he knows there’s no such thing