Mythbusters: Should you start your axes at zero?

I’ve written before about the problem of “rules” and “laws” in data visualization. A classic one is “Thou must start your axes at zero.” If you’re reading my Brinton blog, go see what he had to say about it)

In this post I want to dispel this myth. It’s a myth that’s close to my heart. In August 2009 I went on the record (by commenting on The Guardian’s Datablog) that I disliked one of their charts because their axes didn’t start at zero.

andy comment

Let’s take the data from that Guardian article in order to investigate this Rule. Here’s how it looks with zero included:

with zero

This is bad for at least 4 reasons:

  1. It doesn’t really expose the change of the record over time.
  2. It especially doesn’t highlight the impact Usain Bolt had on the record.
  3. It doesn’t make great use of the space – there’s lots of dead space.
  4. It’s boring.

What happens when you break the rule?

without zero

 

All the problems are removed. It’s engaging, Usain Bolt’s impact is clear and it makes great use of space.

There’s one final reason not to include zero in this case. I do not know what the ultimate fastest time a human being will run 100m, but I can guarantee it isn’t zero seconds. My point is that not all measures you are charting have real zeros. In this case, the “zero” might be 9 seconds or so.

Once you learn the guidelines, you’ll be able to fine tune your charts by bending or breaking them according to your use case and objective. Sticking to the rules means you will satisfy the 5 criteria Alberto Cairo defines for a successful chart (he discussed these at his 2014 Tapestry Keynote):

  1. Truthful
  2. Functional
  3. Beautiful
  4. Insightful
  5. Enlightening

 

 

 

 

Controlling your message with title, colour and orientation

Conside these charts. They have completely different messages and yet the only difference is the title, the colour and the orientation of the bars.

Creating a totally different message with just a title, axis and colour change.
Creating a totally different message with just a title, axis and colour change.

Watch the video below for an explanation, or go check out my slides from an older presentation, “Drive the message home with the right dashboard“. I also make this point in my Brinton talks, as he also covered this point.

Manual data collection

2014-10-14 09.32.18
My log of books/films/gigs

In the 21st century, the age of big data and the Internet of Things, it’s easy to get carried away logging everything you do in databases. I find there’s a charm and happiness in doing some data logging the old-fashioned way: on pen and paper.

What’s the log book above? I write down every book/film/gig/concert/play I read or see. I add a date and a score out of 5. I started the log in 2010. I was complaining to a friend, “Oh, I wish I’d kept a record of every book and gig I’d been to in my life.” It was probably the third time I’d had this conversation with her.

She replied, “Andy, quit bitching and just start one now.”

Good point. I did.

I love the physical object, and the easy nature of browsing back a few years. Sure, I could log this on Goodreads.com and equivalents, but it’s not the same. And part of me thinks it might be something I can share more easily with my kids one day. It’s also an anti-Tableau thing. I love Tableau but, you know what, sometimes I want my data to stay away from the screen. Unlike my music habits of course.

wanted to share another one we received at work recently – this is data collected about Premier League players by someone when they were 8 years old:

IMG_0918

 

Do you collect data? Post a link in the comments below or on Twitter. Let’s share our manual data logs. Geeks of the world: Unite!

100yrs of Data Visualisation: my talk from #data14

I had an amazing time delivering my session about Willard Brinton at the Tableau Conference. Today we made the recordings of the session available to people who attended the conference. If you couldn’t make the session, you can go here to watch the recording*. (note – only conference attendees can access this content – sorry!)

Along with conference session, I am on a mission to bring Brinton the fame he deserves, and am also cataloguing the amazing things he talked about in his book over on my tumblr, http://100yrsofbrinton.tumblr.com/.

If you coudn’t make it, I’ve uploaded the slides for you – you can see them below. Without the talk track, I fear they are not much more than just pretty pictures. Nice pictures, though.

Have you heard of Brinton before? Will you help me make him famous?

* Listening to the recording makes me realise I talk very fast!

Pie or stacked bar?

I published a post on my Brinton blog about how he disliked pie charts. In his 1914 book, he suggested stacked horizontal bars as a much better alternative. A horizontal stacked bar “gives all the data without detracting from the ease of reading the chart itself.”

I put together a quick example in Tableau. Try answering the question: Which region has the highest consumer sales? Is it easier to answer the question with the pie or the stacked bar?

Which version is easier to compare
Download the workbook here

What do you think? The horizontal stacked bar has advantages:

  • It’s definitely easier to answer this specific question on sales/region with the stacked bar
  • Comparison across Regions is much easier
  • If you labelled the marks, they’d be more readable in the bar chart

But even the horizontal bar is not infallible. For example, which is the biggest blue (Consumer) segment in the Regions? It’s actually easier to answer that with the pies, because the blue segment is first in the pie charts

But I cannot deny the pies have some strengths, too. While I used to abhor the use of pies, I have certainly softened my view on pies since I last blogged about them. The debate will continue for a long time. The reason is two-fold:

  1. Some people use pie charts without thinking, or because they don’t know the appropriate times to use them.
  2. Because “it depends”. I’ve shown one specific scenario where stacked bars are better than pies. That does not mean there are not scenarios where pies are better than bars. There are.

I’m interested in your thoughts: comment below or let me know on Twitter.

I’ll leave you with Brinton’s favoured way of showing part-to-whole relationships:

Click here to see this in Brinton's book
Click here to see this in Brinton’s book

 

How to make a slope chart in Tableau

A dynamic slope chart (click to see interactive version)
A dynamic slope chart (click to see interactive version)

Slope charts are cool. They emphasise change between an end date and a start date by removing the noise in between. For a much more detailed explanation and justification, go read Andy Kirk’s homage to slope charts.

This post is going to show how to build a slope chart in Tableau. It’s not the first tutorial on this: there are others by Ben Jones and Andy Kriebel. The issue with those examples is that they all start with data that has just two time points. What happens if you have lots of time data and just want to show the start/end points?

I’m building this using the Superstore sample data.

Start with a time series.

slope start

First use a standard time series chart. In my example, I have Sales by Year for each Container. It doesn’t really matter whether you have a continuous or discrete date pill on your column shelf.

Keep only the first and last values

I want this to be a dynamic slope chart; if I filter the range of dates, the chart should continue to show only the start and end values.

To do this, I create a simple calculated field [First or Last]:

First or last

This calculation returns TRUE for the first mark on each line of each Container. Here’s what happens if I put the calculated field onto the Size shelf:

First or last on size

Show only the ends

Why put the calculation on the Size shelf? Because we need to Hide everything that isn’t the first or last values. In other words, everything that our calculation returns as False. Dropping a pill onto the size shelf reveals a legend:

Size legend

Click on False. Then right-click and choose Hide. Voila! We have our basic slope chart:

slope basic

Improving the basic slope chart

It’s easy to take this much further.

The first thing you can do is move Container from the Colour shelf to the label shelf. I like this for two reasons:

  1. It’s easier on the eye to have fewer colours
  2. The labels are right on the line so the viewer doesn’t have to move their eyes around the canvas too much to identify which line is which

You can also switch the time pill to something else if you wish. In the example below, I’ve switched to a continuous month and added a quick filter for Month:

no colour just label

We’re almost done.

I don’t like the labels, though. They’re above the lines rather than next to them and they’re only at one end.

To make space on the axis for labels alongside the lines and get the alignment correct, I need to:

  1. duplicate the SUM(Sales) pill on the Row shelf
  2. create a dual axis chart
  3. change the continuous Month to a discrete Month, as shown below:
    continuous month discrete

And we’re done (click here for an interactive version and to download my example):

final product

We now have a completed slope chart. It looks great and you can change the filters in order to clearly show the change in Container sales between any two dates.

100yrs of data viz: and we’re still making mistakes – #data14 preview

Don't go overboard on your infographics
Don’t go overboard on your infographics

I’m off to Seattle next week for my 11th Tableau Conference. I’ve spoken at each one I’ve been to but have rarely been as excited about the talk I’m giving, and a new blog that’s going to follow.

The session: 100 yrs of data visualisation and we still make the same mistakes (Thu, 10.45am, room 6B)


Answer me this: who wrote the definitive book about effective data visualisation? Alberto Cairo? Stephen Few? Edward Tufte? They certainly wrote best-selling books. But they were late to the party.

In fact, the first, and I will argue, the best, book on effective data visualisation is 100 yrs old this year. That’s right. 1914. The year of the first commercial airline flight, the first skyscraper in Seattle, and first official Mother’s Day in the US.

 

We had all the answers 100 years ago.
We had all the answers 100 years ago.

 

Graphic Methods for Presenting Facts was written by Willard C Brinton.

My session is based around that book. We’re going to look at

  • how his guidelines for effective best practice are just as valid today.
  • how technology has changed but is always one step behind our imaginatoin
  • how society has changed but the need to share data hasn’t.

By the end of the session I hope everyone will be inspired to ensure that, in 100 more years, our great great grandchildren are visually literate.

The blog: #100yrsOfBrinton (over on tumblr)

Which map pin should you use?
Which map pin should you use?

In just one hour I won’t possibly be able to pack in everything there is to know. Today I am launching a new tumblr, 100 Years Of Brinton. Over the next few months, I’ll be posting snippets from the book. My hope is these will inspire and entertain you. The ultimate goal? For everyone, not just dataviz nerds like me, to know about Willard C Brinton’s amazing book.

I hope you come along to the session. If not, follow the blog, and let me know what you think using the hashtag #100yrsOfBrinton.

 

7 learning points from The Graphical Web

Last week I attended The Graphical Web in Winchester. Tableau were sponsors and I was lucky enough to get to spend time with the people at the cutting edge of open source web graphics. Here’s 7 things I learnt:

1.     Google’s maps are like leaves

http://commons.wikimedia.org/wiki/File:Grapevine_leaf.jpg
Leaf with “hierarchy” of veins (Wikipedia)

One key theme from all cartography sessions was that effective cartography (and, by extension, data visualization) is about taking out as much information as possible. Ed Parsons showed the iterations of Google’s maps as an example. In the past, Google feel into a typical cartographer’s trap of trying to show all the info.

Now, when you search, you get much less information. The colours are more subtle. Ed explained how they took the simplicity of a leaf as inspiration for their newer road network palettes.

2.     Circular diagrams can work

Click to see interactive version
Click to see interactive version

I’ve never really liked chord diagrams, thinking there is always a better way to show the data. Nikola Sander changed my opinion in her explanation of the migration data she works with. Not only was the transition from a table of numbers to a chord diagram visually appealing, I came to realize that I’m not sure there is a better way of looking at this kind of data.

Chord diagrams cannot generally be digested quickly, but once the user has trained themselves to use them, they are an effective method for complex data.

3.     You don’t need to be a proficient public speaker to be engaging on stage

jason davies

Jason Davies, co-author of D3 led the afternoon keynote. From a public speaking perspective, he session not great: not much structure, a bit hesitant, and Jason doesn’t always project his voice well.

So how come this was one of the best sessions of the conference?

Answer: because his work speaks for itself. What Jason has done is push interactive graphics forward with great humility. A showreel of his work is enough to engage an audience as he walks through one after another amazing piece of geometric madness.

4.     Twitter’s Visualisation Lab doesn’t sit still

Click to see the slide deck
Click to see the slide deck

Nicolas Garcia Belmonte led a fantastic review of the interactive work at Twitter (slides here). What amazes me is that his team comes up with such varied ideas. We often see teams have one or two great ideas and then overwork the same idea until it is no longer inspiring. One look at Twitter’s interactives page tells you this is a team with inspiration.

I get the feeling that these visualisations don’t get enough exposure outside the field of data visualisation. Do you agree? What can we do to change this?

5.     “It depends” is the only right answer.

The path to a successful design? (Tim Brennan)

Scott Murray’s talk “The Keys to a Successful Data Design Process” (slides) generated a good debate. How should you go about your design process? It depends. How should you design your chart? It depends. What data should I use? It depends.

This is something I’ve touched on before in my post on data storytelling: you can’t have rules or laws in Data Visualisation – everything depends on something else.

6.     Weather guarantees viral content

Earth: click the image to see the real thing
Earth: click the image to see the real thing

My favourite session was Cameron Beccario’s story of how he created Earth, the amazing live wind map of the globe. I love this kind of story: a project of passion that uncovered many many side projects and problems. It’s a story of data hunting, skill learning, and serendipity that ends in huge success.

7.     I had dinner sat next to a cannon.

The Gun Deck on HMS Warrior

That’s cool. Alan Smith and the Office of National Statistics did an amazing job organizing a great conference. I had been suspicious of the need to drive to Portsmouth for a good evening reception, but a tour around the harbor followed by dinner on the gun deck of HMS Warrior was amazing.

Why Twitter’s decision on Scraperwiki is bad for data democracy

Let’s say I need some basic Twitter data. What’s the difference between these two scenarios?

Scenario 1

scenario 1 developer

I pay a developer to write a script using Twitter’s free API and they push the data into a spreadsheet for me.

Scenario 2

scenario 2 scraper

I connect to Scraperwiki, who have automated the work of the developer, and use their tools to push tweets into a spreadsheet.

Scraperwiki is a great company. They have been able to provide Twitter data for the non-programmer. I’ve used their tools to do effective analysis that highlights the power of Twitter data. I am very disappointed that Twitter has forced them to turn off their Twitter tools. I appreciate Twitter is a business and has to make money, but this decision does not aid data democracy.

The decision favours developers and hinders data enthusiasts. Why? What makes developers a better class of person in Twitter’s eyes? Why can developers with their fancy pants Python, PhP and Ruby skills get their hands on the data, but the rest of the world can’t?

Scraperwiki’s service, in my eyes, removed the middle man for those who don’t know how to code.

Apart from keeping developers in business, I don’t see the difference between the scenarios. Feel free to explain it to me in the comments.

Twitter, if you’re listening, why don’t you support data democracy?

Goooaaaalllll! How we made the World Cup goal analytics viz

Goooaaaaal (static)

Before the World Cup, I was talking with Datasift about how to come up with charts that proved the business value of Datasift/Tableau. That was a fine conversation, but when I turned to crazy south American commentaries, things got more exciting. Who hasn’t loved hearing the commentators go nuts after a goal? How does that look on Twitter, we thought?

And so we built the viz above.

(We did do lots of other business-y stuff which you can find on http://agameof2halves.tumblr.com/ but this was the one that got the most engagement, by a mile.)

How did we build it?

Getting the data

We used Datasift to collect all tweets that included a World Cup hashtag (#WorldCup, #BRAvsCHI, etc) and the word goal. How do you filter for the word goal? Simple. Datasift allows you to filter Tweets by a regular expression, so we used the following:

“\\bgo{1,}a{1,}l{1,}\\b” for goal

“\\bgo{1,}l{1,}\\b” for gol

When the second round games had finished, we pulled the data into a MySQL instance.

Measuring the length of the word “goal”

How do you work out how long the word is? For a while I was stumped until a colleague suggested using SOUNDEX, a SQL function that identifies words based on their sound. SOUNDEX is one of those functions many database programmers learn about in the early days and mentally file in the “I’ll never need to use that but it’s interesting” area of the brain. That’s certainly what I did.

Realising this would work out, I dug around for a function I could use, and found this excellent example. I am grateful to Imranul Hoque for this. I made a couple of tiny tweaks, and now I had a nice query I could use to return the length of the word “goal” or “gol” in every tweet.

Breaking it down by gender in the semi final
Breaking it down by gender in the semi final

Visualising the data

At this point I pointed Tableau at the data to find a story and a good way to visualise it. I did try lots of views, but it took a while to find the right one. The line chart like the one below was an early attempt, but not interesting enough.

An early draft view
An early draft view

I also tried grouping tweets by “lots of letters” and “not lots of letters” but that ended up with a really dull bar chart:

Yawn
Yawn

I moved on to a box plot.

Interesting. Not amazing.
Interesting. Not amazing.

The boxplot itself wasn’t special but this lead directly to the end view.

A voice spoke to me. “Jitter, Andy, Jitter,” it said.

It was the distant calling of Tableau Zen Masters. I’d read about jittering on Steve Wexler’s blog, but never used the technique. Now I had the perfect data. The boxplot I had made stirred a memory of a similar image on one his post, Whatever. But jittering was the answer to making a visually exciting display of the data.

The mental trigger that inspired by jitter viz

The final trick was to add a hook

A viz itself isn’t enough. I knew I wanted a good image that would work inline in Twitter. For this I needed the right size – approx 880×440 – and a good headline. Charts with a good title have more engagement than those without so I chose my title carefully.

The end result?

Simple: my most retweeted piece of content ever: