Goooaaaalllll! How we made the World Cup goal analytics viz

Goooaaaaal (static)

Before the World Cup, I was talking with Datasift about how to come up with charts that proved the business value of Datasift/Tableau. That was a fine conversation, but when I turned to crazy south American commentaries, things got more exciting. Who hasn’t loved hearing the commentators go nuts after a goal? How does that look on Twitter, we thought?

And so we built the viz above.

(We did do lots of other business-y stuff which you can find on http://agameof2halves.tumblr.com/ but this was the one that got the most engagement, by a mile.)

How did we build it?

Getting the data

We used Datasift to collect all tweets that included a World Cup hashtag (#WorldCup, #BRAvsCHI, etc) and the word goal. How do you filter for the word goal? Simple. Datasift allows you to filter Tweets by a regular expression, so we used the following:

“\\bgo{1,}a{1,}l{1,}\\b” for goal

“\\bgo{1,}l{1,}\\b” for gol

When the second round games had finished, we pulled the data into a MySQL instance.

Measuring the length of the word “goal”

How do you work out how long the word is? For a while I was stumped until a colleague suggested using SOUNDEX, a SQL function that identifies words based on their sound. SOUNDEX is one of those functions many database programmers learn about in the early days and mentally file in the “I’ll never need to use that but it’s interesting” area of the brain. That’s certainly what I did.

Realising this would work out, I dug around for a function I could use, and found this excellent example. I am grateful to Imranul Hoque for this. I made a couple of tiny tweaks, and now I had a nice query I could use to return the length of the word “goal” or “gol” in every tweet.

Breaking it down by gender in the semi final

Breaking it down by gender in the semi final

Visualising the data

At this point I pointed Tableau at the data to find a story and a good way to visualise it. I did try lots of views, but it took a while to find the right one. The line chart like the one below was an early attempt, but not interesting enough.

An early draft view

An early draft view

I also tried grouping tweets by “lots of letters” and “not lots of letters” but that ended up with a really dull bar chart:

Yawn

Yawn

I moved on to a box plot.

Interesting. Not amazing.

Interesting. Not amazing.

The boxplot itself wasn’t special but this lead directly to the end view.

A voice spoke to me. “Jitter, Andy, Jitter,” it said.

It was the distant calling of Tableau Zen Masters. I’d read about jittering on Steve Wexler’s blog, but never used the technique. Now I had the perfect data. The boxplot I had made stirred a memory of a similar image on one his post, Whatever. But jittering was the answer to making a visually exciting display of the data.

The mental trigger that inspired by jitter viz

The final trick was to add a hook

A viz itself isn’t enough. I knew I wanted a good image that would work inline in Twitter. For this I needed the right size – approx 880×440 – and a good headline. Charts with a good title have more engagement than those without so I chose my title carefully.

The end result?

Simple: my most retweeted piece of content ever:

The Big Data Debate

I was delighted to be asked to do a talk at import.io’s Big Data Debate, part of London Tech Week. I’m a big fan of import,io and enjoy using their platform to get hold of vitally important data (such as The Eurovision).

Here are the slides I used for the session.

What did you think? Let me know if you enjoyed the content and if you agree or not.

Vizzing the Eurovision

ig

Click for an interactive version

If you’re reading this you’re either:

  • an attendee at the Tableau Conference in the Hague
  • a fan of the Eurovision Song Contest

Everyone of you should be at least one of those!

In the keynote, Tableau EMEA VP, James Eiloart showed a dashboard investigating an important issue of the day: how Austria beat everyone else to win the 2014 Eurovision Song Contest. The graphic above shows the story in a more focussed way. What’s going on?

The red dots show how the TV audience across Europe voted for each entry. The orange ones show how the judges scored them. They are sorted left-to-right by the score given by each country (ie the ones who gave them 12 points are at the left and the null points countries are at the right).

What’s super-clear to me is that while the all the countries’ judges gave both countries high scores, the TV audience scores for the Netherlands really drops off. That’s the reason. I’ve highlighted that below.

tv scores

 

There are some other interesting things that can be found in the scoring data. For example, which countries saw the TV audience and the judges disagreeing? Have a look at the chart below. Poland and Azerbaijan saw the biggest differences but there are a few others.

judge v tv

 

In fact, if you look at how individual countries voted, you can see this in even more stark detail. Check out the UK – the TV Audience loved Poland, but the judges hated them:

tv judge UK

Getting the data

How did I get the data for this? It’s all available on the Eurovision site. It’s not in an easy to gather form (data never is!). So I used Import.io. I set up a crawler and you can use it too. (note – there are a few gotchas in the Tables being scraped as they are inconsistently formatted on the Eurovision site. Import.io did it’s best guess but I did have to do some manual fixing once I’d downloaded it).

Getting the workbook

Feel free to download the workbook if you want. It’s a bit flaky – it was designed primarily for one slide in the Keynote.

Data storytelling: one size does not fit all

If you haven’t listened to the latest episode of Datastories you should. Enrico and Moritz discuss data storytelling with Robert Kosara and Alberto Cairo. It’s a great discussion and worthy of 80 minutes of your time. Of the group, Moritz seems the most unsure/cynical about what storytelling is. He made these clear in the podcast, and also on his “Look ma, no story” post. There are two reasons he is missing the point. In this post I’ll explain why.

“Yes, but what if…?”

For each example of how data stories could work, Moritz suggested a use case where a data story isn’t necessary. We go round in these circles all the time in data visualisation and it appears we may end up in the same vortex with data storytelling. Here’s 2 examples of the circles we experience in other areas of data viz:

“Law” 1: Thou must not use pie charts

This Law of Data Viz says, “Pie charts are bad and should therefore never be used”. But there are perfectly legitimate times to use them (FlowingData has some good examples). But there’s lots of anger against them (even I did a session entitled Pie Chart Are Evil in my days before chilling out about it)

“Law” 2: Bars must point upwards

Bars MUST NOT point down. Ever. Ok?

The Law of Data Viz says, “Bars should point upwards.” That law has been invoked recently with the gun deaths in Florida debate. Of course – the author took inspiration from an amazing example where the law was broken because it was appropriate to evoke a particular emotion.

Bars CAN point down

And the roundabout goes on and on. Yes pie. No pie. Bars up. Bars down. The answer is: IT DEPENDS.

It depends on your audience, on your data, on your objective.

The same applies to storytelling and yet Moritz seems to base his criticism on the fact that for every example of where a story works, he came up with a superficially similar use case where it won’t. In their podcast they touch upon the Martketwatch treemap.

No need for a story?

No need for a story?

Moritz explains that a financial expert doesn’t need a story based around that chart. And he uses that example in his blog post as his criticism of data storytelling. But – hello? – of course the data expert doesn’t need a story about it.

It doesn’t negate the fact that other people might need a story about it. Alberto did make this point in the podcast, fortunately, An executive wanting a financial market report might benefit from some annotation. And a separate audience, for example newspaper readers, would love a story – possibly even an anthropomorphized one – about, say, the battle between Google, Samsung and Apple.

Arguing over validity based on wildly different use cases doesn’t get us anywhere.

Semantics

tomato

Alberto’s trinity of annotation, narration and story is perfect and addresses another problem I’ve struggled with around data storytelling: worrying about the definition of story.

Moritz falls into this trap in the podcast and Robert has done in the past, too (“Stories don’t tell themselves“). If we have an arrangement of data/charts/annotations in front of us, is it a vignette, anecdote, story, annotation, or narrative?

Frankly I don’t care.

We shouldn’t fret about taking words that evolved to describe verbal and written methods of communication and make sure we get the term perfectly aligned to something in data visualisation. We can argue all day about whether something is a story or an annotation. In fact: I am sure we will debate it for years. Whatever word you choose, I hope we all agree that there’s a difference between a basic unannotated bar chart, and an annotated, sequentially arranged chart/charts.

If you say annotation and I say story, well, hey, that’s just fine.

Moritz told us to “make worlds, not stories”, which is even more of a semantic challenge; I don’t know what that even means.

I do suspect storytelling will be over used by marketing departments and the media to cover all manner of things. This will be a shame and dangerous in the long run. See the back lash against big data as an example

It depends…

The longer I spend in this field, the more I realise the only answer to a question about what’s the right term, or the right thing to do is “It depends….” What I heard in the Datastories podcast was a little too much rigidity from Moritz. There are, at best, guidelines about what’s right and wrong, and any work you produce should be measured by its effectiveness, not it’s conformance to a specific term.

Via http://cheezburger.com/330730240

SXSW: the most influential visualisations of all time

thevizzes

For the slides – head to the end of this post

For Tableau-versions of the vizzes, head to this post

What do you think are the most influential visualisations of all time? I asked that at SXSW today: what a great time it was delivering the session to everyone. If you want to watch the session again, you can watch an older version I recorded with Data Science Central.

I’ve also uploaded the Tableau workbook I used to recreate the visualisations (see this post). This in itself was an interesting exercise. Do they look better in a modern data visualisation tool? I don’t think so (as you can see in my exploration of Nightingale’s chart earlier this week). What the originals have is the depth of the personal touch. Being hand-drawn they contain the soul of the author of the viz. Each dot of ink will have been considered carefully:

-          What layout will best communicate my message?

-          Should I use colour?

-          What thickness should the axis be?

Today, not enough people stop to take the time to make those considerations.

My slides are also available on slideshare. And if you want to go do some further reading, I have a bundle of links on Bitly.

I’m interested to know what you think of my choices and if you would choose different ones.

The influential vizzes. In Tableau

So: can you recreate my list of influential vizzes in Tableau? Sure. Mostly. Here’s an interactive gallery. Whether you SHOULD recreate these vizzes or not is a discussion in itself. I leave it to you to debate this in the comments below. The supporting slides and more info can be found in this post: “SXSW: The most influential visualisation of all time“.

William Playfair

Playfair nailed it. It’s a great viz that also looks great in modern tools. Well labelled and nicely designed. What can I add? Um – interactive tooltips?

Florence Nightingale

Tableau can’t do radar chart or rose diagrams. That’s probably a good thing. I tried many different views but the one that had the most impact, I believe, is a stacked area chart.