100yrs of Data Visualisation: my talk from #data14

I had an amazing time delivering my session about Willard Brinton at the Tableau Conference. Today we made the recordings of the session available to people who attended the conference. If you couldn’t make the session, you can go here to watch the recording*. (note – only conference attendees can access this content – sorry!)

Along with conference session, I am on a mission to bring Brinton the fame he deserves, and am also cataloguing the amazing things he talked about in his book over on my tumblr, http://100yrsofbrinton.tumblr.com/.

If you coudn’t make it, I’ve uploaded the slides for you – you can see them below. Without the talk track, I fear they are not much more than just pretty pictures. Nice pictures, though.

Have you heard of Brinton before? Will you help me make him famous?

* Listening to the recording makes me realise I talk very fast!

Pie or stacked bar?

I published a post on my Brinton blog about how he disliked pie charts. In his 1914 book, he suggested stacked horizontal bars as a much better alternative. A horizontal stacked bar “gives all the data without detracting from the ease of reading the chart itself.”

I put together a quick example in Tableau. Try answering the question: Which region has the highest consumer sales? Is it easier to answer the question with the pie or the stacked bar?

What do you think? The horizontal stacked bar has advantages:

  • It’s definitely easier to answer this specific question on sales/region with the stacked bar
  • Comparison across Regions is much easier
  • If you labelled the marks, they’d be more readable in the bar chart

But even the horizontal bar is not infallible. For example, which is the biggest blue (Consumer) segment in the Regions? It’s actually easier to answer that with the pies, because the blue segment is first in the pie charts

But I cannot deny the pies have some strengths, too. While I used to abhor the use of pies, I have certainly softened my view on pies since I last blogged about them. The debate will continue for a long time. The reason is two-fold:

  1. Some people use pie charts without thinking, or because they don’t know the appropriate times to use them.
  2. Because “it depends”. I’ve shown one specific scenario where stacked bars are better than pies. That does not mean there are not scenarios where pies are better than bars. There are.

I’m interested in your thoughts: comment below or let me know on Twitter.

I’ll leave you with Brinton’s favoured way of showing part-to-whole relationships:

 

How to make a slope chart in Tableau

A dynamic slope chart (click to see interactive version)

A dynamic slope chart (click to see interactive version)

Slope charts are cool. They emphasise change between an end date and a start date by removing the noise in between. For a much more detailed explanation and justification, go read Andy Kirk’s homage to slope charts.

This post is going to show how to build a slope chart in Tableau. It’s not the first tutorial on this: there are others by Ben Jones and Andy Kriebel. The issue with those examples is that they all start with data that has just two time points. What happens if you have lots of time data and just want to show the start/end points?

I’m building this using the Superstore sample data.

Start with a time series.

slope start

First use a standard time series chart. In my example, I have Sales by Year for each Container. It doesn’t really matter whether you have a continuous or discrete date pill on your column shelf.

Keep only the first and last values

I want this to be a dynamic slope chart; if I filter the range of dates, the chart should continue to show only the start and end values.

To do this, I create a simple calculated field [First or Last]:

First or last

This calculation returns TRUE for the first mark on each line of each Container. Here’s what happens if I put the calculated field onto the Size shelf:

First or last on size

Show only the ends

Why put the calculation on the Size shelf? Because we need to Hide everything that isn’t the first or last values. In other words, everything that our calculation returns as False. Dropping a pill onto the size shelf reveals a legend:

Size legend

Click on False. Then right-click and choose Hide. Voila! We have our basic slope chart:

slope basic

Improving the basic slope chart

It’s easy to take this much further.

The first thing you can do is move Container from the Colour shelf to the label shelf. I like this for two reasons:

  1. It’s easier on the eye to have fewer colours
  2. The labels are right on the line so the viewer doesn’t have to move their eyes around the canvas too much to identify which line is which

You can also switch the time pill to something else if you wish. In the example below, I’ve switched to a continuous month and added a quick filter for Month:

no colour just label

We’re almost done.

I don’t like the labels, though. They’re above the lines rather than next to them and they’re only at one end.

To make space on the axis for labels alongside the lines and get the alignment correct, I need to:

  1. duplicate the SUM(Sales) pill on the Row shelf
  2. create a dual axis chart
  3. change the continuous Month to a discrete Month, as shown below:
    continuous month discrete

And we’re done (click here for an interactive version and to download my example):

final product

We now have a completed slope chart. It looks great and you can change the filters in order to clearly show the change in Container sales between any two dates.

100yrs of data viz: and we’re still making mistakes – #data14 preview

Don't go overboard on your infographics

Don’t go overboard on your infographics

I’m off to Seattle next week for my 11th Tableau Conference. I’ve spoken at each one I’ve been to but have rarely been as excited about the talk I’m giving, and a new blog that’s going to follow.

The session: 100 yrs of data visualisation and we still make the same mistakes (Thu, 10.45am, room 6B)


Answer me this: who wrote the definitive book about effective data visualisation? Alberto Cairo? Stephen Few? Edward Tufte? They certainly wrote best-selling books. But they were late to the party.

In fact, the first, and I will argue, the best, book on effective data visualisation is 100 yrs old this year. That’s right. 1914. The year of the first commercial airline flight, the first skyscraper in Seattle, and first official Mother’s Day in the US.

 

We had all the answers 100 years ago.

We had all the answers 100 years ago.

 

Graphic Methods for Presenting Facts was written by Willard C Brinton.

My session is based around that book. We’re going to look at

  • how his guidelines for effective best practice are just as valid today.
  • how technology has changed but is always one step behind our imaginatoin
  • how society has changed but the need to share data hasn’t.

By the end of the session I hope everyone will be inspired to ensure that, in 100 more years, our great great grandchildren are visually literate.

The blog: #100yrsOfBrinton (over on tumblr)

Which map pin should you use?

Which map pin should you use?

In just one hour I won’t possibly be able to pack in everything there is to know. Today I am launching a new tumblr, 100 Years Of Brinton. Over the next few months, I’ll be posting snippets from the book. My hope is these will inspire and entertain you. The ultimate goal? For everyone, not just dataviz nerds like me, to know about Willard C Brinton’s amazing book.

I hope you come along to the session. If not, follow the blog, and let me know what you think using the hashtag #100yrsOfBrinton.

 

7 learning points from The Graphical Web

Last week I attended The Graphical Web in Winchester. Tableau were sponsors and I was lucky enough to get to spend time with the people at the cutting edge of open source web graphics. Here’s 7 things I learnt:

1.     Google’s maps are like leaves

http://commons.wikimedia.org/wiki/File:Grapevine_leaf.jpg

Leaf with “hierarchy” of veins (Wikipedia)

One key theme from all cartography sessions was that effective cartography (and, by extension, data visualization) is about taking out as much information as possible. Ed Parsons showed the iterations of Google’s maps as an example. In the past, Google feel into a typical cartographer’s trap of trying to show all the info.

Now, when you search, you get much less information. The colours are more subtle. Ed explained how they took the simplicity of a leaf as inspiration for their newer road network palettes.

2.     Circular diagrams can work

Click to see interactive version

Click to see interactive version

I’ve never really liked chord diagrams, thinking there is always a better way to show the data. Nikola Sander changed my opinion in her explanation of the migration data she works with. Not only was the transition from a table of numbers to a chord diagram visually appealing, I came to realize that I’m not sure there is a better way of looking at this kind of data.

Chord diagrams cannot generally be digested quickly, but once the user has trained themselves to use them, they are an effective method for complex data.

3.     You don’t need to be a proficient public speaker to be engaging on stage

jason davies

Jason Davies, co-author of D3 led the afternoon keynote. From a public speaking perspective, he session not great: not much structure, a bit hesitant, and Jason doesn’t always project his voice well.

So how come this was one of the best sessions of the conference?

Answer: because his work speaks for itself. What Jason has done is push interactive graphics forward with great humility. A showreel of his work is enough to engage an audience as he walks through one after another amazing piece of geometric madness.

4.     Twitter’s Visualisation Lab doesn’t sit still

Click to see the slide deck

Click to see the slide deck

Nicolas Garcia Belmonte led a fantastic review of the interactive work at Twitter (slides here). What amazes me is that his team comes up with such varied ideas. We often see teams have one or two great ideas and then overwork the same idea until it is no longer inspiring. One look at Twitter’s interactives page tells you this is a team with inspiration.

I get the feeling that these visualisations don’t get enough exposure outside the field of data visualisation. Do you agree? What can we do to change this?

5.     “It depends” is the only right answer.

The path to a successful design? (Tim Brennan)

Scott Murray’s talk “The Keys to a Successful Data Design Process” (slides) generated a good debate. How should you go about your design process? It depends. How should you design your chart? It depends. What data should I use? It depends.

This is something I’ve touched on before in my post on data storytelling: you can’t have rules or laws in Data Visualisation – everything depends on something else.

6.     Weather guarantees viral content

Earth: click the image to see the real thing

Earth: click the image to see the real thing

My favourite session was Cameron Beccario’s story of how he created Earth, the amazing live wind map of the globe. I love this kind of story: a project of passion that uncovered many many side projects and problems. It’s a story of data hunting, skill learning, and serendipity that ends in huge success.

7.     I had dinner sat next to a cannon.

The Gun Deck on HMS Warrior

That’s cool. Alan Smith and the Office of National Statistics did an amazing job organizing a great conference. I had been suspicious of the need to drive to Portsmouth for a good evening reception, but a tour around the harbor followed by dinner on the gun deck of HMS Warrior was amazing.

Why Twitter’s decision on Scraperwiki is bad for data democracy

Let’s say I need some basic Twitter data. What’s the difference between these two scenarios?

Scenario 1

scenario 1 developer

I pay a developer to write a script using Twitter’s free API and they push the data into a spreadsheet for me.

Scenario 2

scenario 2 scraper

I connect to Scraperwiki, who have automated the work of the developer, and use their tools to push tweets into a spreadsheet.

Scraperwiki is a great company. They have been able to provide Twitter data for the non-programmer. I’ve used their tools to do effective analysis that highlights the power of Twitter data. I am very disappointed that Twitter has forced them to turn off their Twitter tools. I appreciate Twitter is a business and has to make money, but this decision does not aid data democracy.

The decision favours developers and hinders data enthusiasts. Why? What makes developers a better class of person in Twitter’s eyes? Why can developers with their fancy pants Python, PhP and Ruby skills get their hands on the data, but the rest of the world can’t?

Scraperwiki’s service, in my eyes, removed the middle man for those who don’t know how to code.

Apart from keeping developers in business, I don’t see the difference between the scenarios. Feel free to explain it to me in the comments.

Twitter, if you’re listening, why don’t you support data democracy?

Goooaaaalllll! How we made the World Cup goal analytics viz

Goooaaaaal (static)

Before the World Cup, I was talking with Datasift about how to come up with charts that proved the business value of Datasift/Tableau. That was a fine conversation, but when I turned to crazy south American commentaries, things got more exciting. Who hasn’t loved hearing the commentators go nuts after a goal? How does that look on Twitter, we thought?

And so we built the viz above.

(We did do lots of other business-y stuff which you can find on http://agameof2halves.tumblr.com/ but this was the one that got the most engagement, by a mile.)

How did we build it?

Getting the data

We used Datasift to collect all tweets that included a World Cup hashtag (#WorldCup, #BRAvsCHI, etc) and the word goal. How do you filter for the word goal? Simple. Datasift allows you to filter Tweets by a regular expression, so we used the following:

“\\bgo{1,}a{1,}l{1,}\\b” for goal

“\\bgo{1,}l{1,}\\b” for gol

When the second round games had finished, we pulled the data into a MySQL instance.

Measuring the length of the word “goal”

How do you work out how long the word is? For a while I was stumped until a colleague suggested using SOUNDEX, a SQL function that identifies words based on their sound. SOUNDEX is one of those functions many database programmers learn about in the early days and mentally file in the “I’ll never need to use that but it’s interesting” area of the brain. That’s certainly what I did.

Realising this would work out, I dug around for a function I could use, and found this excellent example. I am grateful to Imranul Hoque for this. I made a couple of tiny tweaks, and now I had a nice query I could use to return the length of the word “goal” or “gol” in every tweet.

Breaking it down by gender in the semi final

Breaking it down by gender in the semi final

Visualising the data

At this point I pointed Tableau at the data to find a story and a good way to visualise it. I did try lots of views, but it took a while to find the right one. The line chart like the one below was an early attempt, but not interesting enough.

An early draft view

An early draft view

I also tried grouping tweets by “lots of letters” and “not lots of letters” but that ended up with a really dull bar chart:

Yawn

Yawn

I moved on to a box plot.

Interesting. Not amazing.

Interesting. Not amazing.

The boxplot itself wasn’t special but this lead directly to the end view.

A voice spoke to me. “Jitter, Andy, Jitter,” it said.

It was the distant calling of Tableau Zen Masters. I’d read about jittering on Steve Wexler’s blog, but never used the technique. Now I had the perfect data. The boxplot I had made stirred a memory of a similar image on one his post, Whatever. But jittering was the answer to making a visually exciting display of the data.

The mental trigger that inspired by jitter viz

The final trick was to add a hook

A viz itself isn’t enough. I knew I wanted a good image that would work inline in Twitter. For this I needed the right size – approx 880×440 – and a good headline. Charts with a good title have more engagement than those without so I chose my title carefully.

The end result?

Simple: my most retweeted piece of content ever:

The Big Data Debate

I was delighted to be asked to do a talk at import.io’s Big Data Debate, part of London Tech Week. I’m a big fan of import,io and enjoy using their platform to get hold of vitally important data (such as The Eurovision).

Here are the slides I used for the session.

What did you think? Let me know if you enjoyed the content and if you agree or not.

Vizzing the Eurovision

ig

Click for an interactive version

If you’re reading this you’re either:

  • an attendee at the Tableau Conference in the Hague
  • a fan of the Eurovision Song Contest

Everyone of you should be at least one of those!

In the keynote, Tableau EMEA VP, James Eiloart showed a dashboard investigating an important issue of the day: how Austria beat everyone else to win the 2014 Eurovision Song Contest. The graphic above shows the story in a more focussed way. What’s going on?

The red dots show how the TV audience across Europe voted for each entry. The orange ones show how the judges scored them. They are sorted left-to-right by the score given by each country (ie the ones who gave them 12 points are at the left and the null points countries are at the right).

What’s super-clear to me is that while the all the countries’ judges gave both countries high scores, the TV audience scores for the Netherlands really drops off. That’s the reason. I’ve highlighted that below.

tv scores

 

There are some other interesting things that can be found in the scoring data. For example, which countries saw the TV audience and the judges disagreeing? Have a look at the chart below. Poland and Azerbaijan saw the biggest differences but there are a few others.

judge v tv

 

In fact, if you look at how individual countries voted, you can see this in even more stark detail. Check out the UK – the TV Audience loved Poland, but the judges hated them:

tv judge UK

Getting the data

How did I get the data for this? It’s all available on the Eurovision site. It’s not in an easy to gather form (data never is!). So I used Import.io. I set up a crawler and you can use it too. (note – there are a few gotchas in the Tables being scraped as they are inconsistently formatted on the Eurovision site. Import.io did it’s best guess but I did have to do some manual fixing once I’d downloaded it).

Getting the workbook

Feel free to download the workbook if you want. It’s a bit flaky – it was designed primarily for one slide in the Keynote.

Data storytelling: one size does not fit all

If you haven’t listened to the latest episode of Datastories you should. Enrico and Moritz discuss data storytelling with Robert Kosara and Alberto Cairo. It’s a great discussion and worthy of 80 minutes of your time. Of the group, Moritz seems the most unsure/cynical about what storytelling is. He made these clear in the podcast, and also on his “Look ma, no story” post. There are two reasons he is missing the point. In this post I’ll explain why.

“Yes, but what if…?”

For each example of how data stories could work, Moritz suggested a use case where a data story isn’t necessary. We go round in these circles all the time in data visualisation and it appears we may end up in the same vortex with data storytelling. Here’s 2 examples of the circles we experience in other areas of data viz:

“Law” 1: Thou must not use pie charts

This Law of Data Viz says, “Pie charts are bad and should therefore never be used”. But there are perfectly legitimate times to use them (FlowingData has some good examples). But there’s lots of anger against them (even I did a session entitled Pie Chart Are Evil in my days before chilling out about it)

“Law” 2: Bars must point upwards

Bars MUST NOT point down. Ever. Ok?

The Law of Data Viz says, “Bars should point upwards.” That law has been invoked recently with the gun deaths in Florida debate. Of course – the author took inspiration from an amazing example where the law was broken because it was appropriate to evoke a particular emotion.

Bars CAN point down

And the roundabout goes on and on. Yes pie. No pie. Bars up. Bars down. The answer is: IT DEPENDS.

It depends on your audience, on your data, on your objective.

The same applies to storytelling and yet Moritz seems to base his criticism on the fact that for every example of where a story works, he came up with a superficially similar use case where it won’t. In their podcast they touch upon the Martketwatch treemap.

No need for a story?

No need for a story?

Moritz explains that a financial expert doesn’t need a story based around that chart. And he uses that example in his blog post as his criticism of data storytelling. But – hello? – of course the data expert doesn’t need a story about it.

It doesn’t negate the fact that other people might need a story about it. Alberto did make this point in the podcast, fortunately, An executive wanting a financial market report might benefit from some annotation. And a separate audience, for example newspaper readers, would love a story – possibly even an anthropomorphized one – about, say, the battle between Google, Samsung and Apple.

Arguing over validity based on wildly different use cases doesn’t get us anywhere.

Semantics

tomato

Alberto’s trinity of annotation, narration and story is perfect and addresses another problem I’ve struggled with around data storytelling: worrying about the definition of story.

Moritz falls into this trap in the podcast and Robert has done in the past, too (“Stories don’t tell themselves“). If we have an arrangement of data/charts/annotations in front of us, is it a vignette, anecdote, story, annotation, or narrative?

Frankly I don’t care.

We shouldn’t fret about taking words that evolved to describe verbal and written methods of communication and make sure we get the term perfectly aligned to something in data visualisation. We can argue all day about whether something is a story or an annotation. In fact: I am sure we will debate it for years. Whatever word you choose, I hope we all agree that there’s a difference between a basic unannotated bar chart, and an annotated, sequentially arranged chart/charts.

If you say annotation and I say story, well, hey, that’s just fine.

Moritz told us to “make worlds, not stories”, which is even more of a semantic challenge; I don’t know what that even means.

I do suspect storytelling will be over used by marketing departments and the media to cover all manner of things. This will be a shame and dangerous in the long run. See the back lash against big data as an example

It depends…

The longer I spend in this field, the more I realise the only answer to a question about what’s the right term, or the right thing to do is “It depends….” What I heard in the Datastories podcast was a little too much rigidity from Moritz. There are, at best, guidelines about what’s right and wrong, and any work you produce should be measured by its effectiveness, not it’s conformance to a specific term.

Via http://cheezburger.com/330730240