Joseph Priestley’s 1765 Big Data Case Study

Go check this out

Reading articles on the process of data visualization is enlightening. At the London Tableau User Group this week, for example, Andy Kirk explained the process behind his recent work showcasing Liverpool FC’s roller-coaster season. It was entertaining and educational.

You might think this kind of thing is new, only happening in the world of meetups and blogs. Wrong!

One of the earliest examples I know of is Joseph Priestley’s commentaries on his Charts of Biography and History.

In those books, you get the same challenges and opportunities described that you would do. It amazes me to read this and realise that the challenges we face today are the same as Priestley faced 250 years ago.

Ground zero of visualizing time

Here he is talking through his ideas of representing time in a chart:

We have no distinct idea of length of time until we have conceived it in the form of some sensible thing that has length as of a line.

What’s most fascinating is that this pamphlet IS THE START of people thinking about how to visualize time. Priestley came up with the idea of a Gantt bar, with length, to represent time. I cannot stress the importance of this enough. Today we don’t think twice about using lines and bars. But in 1765, they were inventing these ideas.

Collecting and cleaning data: a pain back then too

I shall not mention the pains it has cost me to reconcile and adjust the different values I have met concerning great numbers of them.

He describes the challenges of finding and cleaning data, and the wondrous opportunities of discovering new insights as he went along.

Laborious and  tedious as the compilation of this work has been… a variety of views were continually opening upon me during the execution of it

That one is my favourite.

prriestly quote

Aggressive dataviz police were a problem then too

He also tries to address potential criticism head on. Dealing with the dogmatic dataviz police is still a problem in 2016, which is a shame:

No human work of such a nature as this can be expected to be faultless. I hope no candid person will think … that they are either so numerous or so great as considerably to lessen the use of the whole

It’s hard to know when to stop

He also describes the difficulty of knowing when to stop. I’ve written about this when talking about the process of data visualization. It can be so much fun tweaking and pruning a data visualization that sometimes you just have to stop.

The many times I have altered my lists convinces me that I should never revise them without seeing some reason to make farther alterations. The many times that I have replaced the same names after having rejected them convinces me that farther alterations would have been of very little consequence

He provides his data

No data project is truly authentic if you can’t access the underlying data. Fortunately, Priestley provided the entire dataset for the Chart of Biography. If you want to remake the chart or challenge his assumptions, you can!

the data

Joseph Priestley  – a legend

Joseph Priestley was an amazing polymath. Not only was he great with data, he also discovered oxygen, caused riots, catalogued electricity and was friends with US presidents (I recommend this book about him). He paved the way for data visualization.

MakeoverMonday: Horizontal History

Chart of Biography
Interactive, downloadable version here

Today’s makeover sees me completing an ambition of 5 years: remake Joseph Priestley’s Chart of Biography in Tableau. Finally, all of my 5 Most Influential Vizzes have been remade in Tableau.

Here’s the source chart for this week’s Makeover:

See the others at Why Ask Why

It’s a horizontal history from the excellent site Why Ask Why. It’s a cool data experiment and exploration. The article inspired Yura Bagdanov to do a horizontal version. Of course, when I read the article, I saw only the Chart of Biography, what I think is the most influential chart of all time:

Go read more on Wikipedia.

Priestley’s chart was the first to condense time onto an x-axis which fit on a single page (Dubourg did something similar earlier, but his chart was 54ft long!). It’s also the first Gantt chart. And it directly influenced William Playfair as he created his statistical line charts. Boom! Check the blog tomorrow as my next post is all about Priestley’s own analysis of his chart.

This dataset doesn’t have the same names as Priestley’s, but it’s the same type of data: thousands of famous people with details of their life and death.

Making the chart: the 1765 version

Priestley created a Gantt chart. Could I do the same in Tableau? Well yes, but all my early efforts didn’t really work out:

Chart of biggraphy true to original
Too much detail in the Gantt

The problem was there’s just too many names! I used jittering to randomly distribute the names in each pane but I wasn’t happy with the output. Time to rethink the view with a 2016 perspective.

Making the chart: A New Chart of History

I did have a go of recreating A New Chart of History too, but it only highlighted the inaccuracy of the data. 50% of famous people born in 1950-2000 came from North America? Not sure about that.

New Chart of History

Making the chart: the 2016 version

Interactivity gives us lots of options!

Tooltips and highlighting

Priestley labelled every single bar. Can you imagine how hard and tedious that must have been? All I did was make a nice tooltip!

tooltip

You can also see in the above image that the country is highlighted. Another advantage of modern interactive tools!

filters

Just the women in this view
Just the women in this view

I also get the advantage of filtering. Above is the view with only the females in the dataset shown.

The view above also highlights the problem with the Pantheon project: it’s incomplete. Only 6 famous female explorers? 5 business women?

I am very grateful to the London Viz Club for getting the data from the Pantheon project – that was a cool little Alteryx task. I’ve waited 5 years for a dataset like this: today is a happy day!303

MakeoverMonday: Militarization of the Middle East

Oh gang, I apologise but this is a super-brief Makeover this week. Work and life have multiple demands this week. Given the time squeeze, I gave myself 15 minutes to make a viz, with the rest of the time spent on this blog post.

15 minutes? What can you possibly focus on?

With only 15 minutes, I’m clearly not going to go deep into the data (even though it looks like it has some amazing detail).

0-5 minutes

Instead, I looked for one point being made in the article.

This caught my eye
This caught my eye

The flag section caught my eye. Firstly because it was the hardest part to interpret. So many flags and arrows and words and icons. What does it all mean? Once I deciphered the meaning, it seemed pretty interesting: 3 Middle East nations are importing way more than previously.

That was interesting: is it called out in the article? It sure is: they claim that these three countries are arming themselves in response to unrest in Syria. I find this horrible and fascinating: military hardware companies will be rubbing their hands with glee at the conflicts around the world.

Could I remake this story using the data?

5-11 minutes: building the viz

I’d spent a few minutes digesting the story, so needed to see what the dataset revealed. If you only have a matter of minutes, and want to look at how a measure changes over time, go for a line chart. There’s no time for mucking around with different views.

I filtered out the countries, and there you go: Qatar, Saudi Arabia and UAE all going up.

11-15 minutes: formatting

How do you make a simple makeover look really fancy? Choose an unusual font and background colour. Instant respect from us all! Well, there’s no time for that today. All I could manage was a quick switch to Smooth theme and writing a nice title. For simple charts like this, I’ll always try to ask a question, giving the viewer the information they need to query the chart itself.

15 minutes and I’m done.

How does a 15 minute makeover feel?

The main feeling is of fraud. I didn’t do much detailed checking of other countries. I don’t feel like I have really tested the hypothesis that “Middle Eastern Countries are spending more because of Syria.” I feel like I’ve just accepted the point made by the journalist and made a chart which, kind of, supports that opinion.

I feel like I’ve used the data to support the story, rather than use the data to find the story.

 

 

MakeoverMonday: Global Warming is Spiralling Out of Control

[Note 1: Yesterday I had a classic MakeoverMonday experience. I wrote this post, and was ready to hit publish. I then realised I just needed to tweak one of my images. I went back to Tableau, had a brainwave, and ended up with a completely new idea. I NEVER would have come up with that idea had I not been able to drag drop and experiment so readily. I chose to keep this post for Tuesday]

[Note 2: The CORRECT baseline is 1961-1990. The charts in this post have incorrect titles. Download the workbook to see correct versions]

The world is getting hotter indeed. The data comes from the UK MetOffice’s HADCrut4 data: a global, gridded dataset of surface anomaly temperatures.

The science behind the dataset is complex, but the data’s straightforward: the measure is going up over time. How should you best show an upward trend?

This week, three ideas came to mind before I explored the data. I implemented each one.

1. Straight line (with politics)

politics
Which presidential candidate do you trust on climate change?

You can’t beat a trend line. It’s visually the most straightforward and effective for displaying an upward trend. I chose to emphasise the moving average (red) with the actual anomalies in grey in the background.

The rising line chart conveniently leaves white space into which you can insert objects to further make your point. In this case I found two representative tweets from the likely US presidential candidates.

2. Bloomberg-inspired animation

global temps
This is a GIF – click the image to see the animation if it doesn’t begin

Bloomberg did an amazing visualisation with this data last year. Here was my excuse to recreate it. I think this is an especially good way to show the data because the animation brings drama to numbers. As the hottest year creeps ever upwards you have a sense of dread. “Wow, 1995 was hot. It can’t get hotter, can it? Oh. It did, 1998. Ouch. And again. And again. Yikes.”

This week’s chart was essentially exactly the same idea, spiralised. Personally, I think the radial display makes it much harder to see the extremes creeping ever higher.

3. The highlight table

Click image to see a bigger, hi-res version

I love a highlight table. This one lets you look up each month, should you wish to, but shows, right at the top, just how common the broken records are happening. It was fact-checking the rank calculation which led me to the idea of histograms for my actual MakeoverMonday, published yesterday.

I also quite like tall and thin, but in this case, I think there’s just too much detail. We’re really making the point that the most recent months are super-hot. The highlight table takes a lot of vertical space to make that point.

top 10 labels
Detail from the top of the chart: the most important stuff.

Click here to see the final Makeover post.

MakeoverMonday: Global temperature is spiralling out of control

Click the image to see a larger version. 10.0 only this week – click here to download a copy (when it asks you to locate the extract, point it to this one)

[Today I had a classic MakeoverMonday experience. I wrote my original post, and was ready to hit publish. I then realised I just needed to check my calculations were correct. I went back to Tableau. While checking the data, I came upon a completely new idea. I NEVER would have come up with that idea had I not been able to drag, drop and experiment so readily. I will publish the original post tomorrow.]

The world is getting hotter. This week’s data comes from the UK MetOffice’s HADCrut4 data: a global, gridded dataset of surface anomaly temperatures. Note: the baseline for HADCrut4 is 1961-1990, not 1850-1900 as stated in the original article. See the MetOffice page for more details. 

The science behind the dataset is complex, but the data’s straightforward: the measure is going up over time. How should you best show an upward trend? I had three ideas, which I implemented, and will publish tomorrow.

A final check of the data, though, led me to the idea of a histogram.

Are histograms good charts?

I really like my chart this week. It shows just how much the 21st century has been above average in an unusual way. The challenge with histograms though is that they aren’t as immediately understandable as a line chart. You’ll see in tomorrow’s posts that I was initially riffing on line charts. If you’re sharing your findings with people who don’t usually see many charts, or have much time, you might want to show a simpler chart. Or you could trust that your audience is in fact intelligent and go with this design.

Iterations

I built a histogram initially just to check whether one of my calculations was correct. I immediately realised it was an interesting way of showing the data. But which chart shape and at what level of granularity?

histogram bar with month detail orange

My first version showed every month as a separate mark (because that’s what I was trying to validate). However, it’s just too much detail and nobody really wants to know the specific value for a particular month in the 1990s. It’s the trend that’s important.

histrogram area orange

I tried an area chart too. I like this as it shows the waves of the different time periods. However, it’s just one level of complexity too far. A histogram’s challenging enough without colouring it by groups and using area instead of bars.

histogram bars1 orange

Bars it was! My final step was to tell the story. I turned to colour here. My story is about the years since 2000, so I changed the palette to emphasise those years. Red for the recent colours, greys for everything else:colours

Unstacked area?

Finally, does an unstacked area work best of all? I think it might…

Helping the user understand a histogram

Here’s my biggest challenge with histograms: how do you help a reader understand it in as short a time as possible?

Custom labelling to aid the user
Custom labeling to aid the user
  • I created custom axis labeling as shown above
  • I annotated one of the marks
  • I used colour in the title to further explain what each mark showed

Did that work? How easy was it for you to interpret the chart?

The original chart

With 12k retweets at time of writing, people clearly love spirally climate data!

What I like
  • If you watch the animation, it clearly expands outwards
  • The colours pop out (although they seem arbitrary)
  • Spirals fit into a small space, like a tweet
What I would improve

Spirals?

This is a straightforward timeline and the radial nature simply does not show the growth over time. Bloomberg did a much more exciting animated version. A simple trendline shows growth better, too, in my opinion. Growth in a sprial is only visible by a vague awareness of an expanding circumference. Spikes in months or years are lost in the noise and confusion of the sprial.

But…. Twelve Thousand Retweets? For all the problems of spirals, people engage with them. Is it better to get people thinking about the data, or be a chart purist? Bloomberg, when it tweeted about it’s story with a map, got only 192 retweets. From an account with THREE MILLION FOLLOWERS.

Conclusion? Spirals aren’t the “best” way to show the data, but they make people look at it.

MakeoverMonday: American women work way more than their European counterparts. [Really?]

TL;DR - just look at this chart. More details below.
TL;DR – just look at this chart. More details below.

[This week you have multiple ways to see my Makeover. It’s available here as a Tableau Story. Or read below for the story rendered as a post. Notes on the original chart are at the end, too]

Go see this as a Story in Tableau
Go see this as a Story in Tableau

The Makeover

Business Insider make a bold claim in their headline

2016-05-09_10-21-05
Click here to see the article

Actually, of the 21 EU countries in the dataset, women work more than the Americans in 9 of them.

2016-05-09_11-20-22

The Netherlands is an interesting outlier

2016-05-09_11-08-01

What is about the Dutch?

Click here for to see the articleCheck out these great articles from The Economist and Slate on why Dutch women don’t work so much as other nations.

Finally, should we trust Labour Force Statistics which involve gender?

more or less2

Check out this week’s fantastic episode of More or Less for more information.

The Analysis

You can download my workbook here. It’s using v10 of Tableau. (in beta at time of writing)

As you’ve gathered, this week my makeover was inspired by questioning the orignal chart. The chart itself is ok as far as stacked bar charts go. I question the boldness of the claim, though.

First of all, lots of the EU countries have higher levels of work than the US.

Secondly, as More or Less discussed this week, there are many reasons why gender data in employment statistics might be incorrect. Or, if not incorrect, the surveys are bias against female employees. For example, surveys often ask about “primary” employment. This ignores second jobs, which more women have than men. Uganda changed its surveys and female employment numbers went up by hundreds of thousands!

What did I like about the original?

  • A stacked bar is pretty clear.
  • I can easily find the categories on the legend and compare countries

What didn’t I like?

  • Everything’s got the same intensity. I’d have softened the borders, axis lines and labels, so that the data is more clear
  • It’s a good job I know my country abbreviations. GER, ITA? Some people might not know what they mean. They may think OECD is a country of its own.
  • There are too many tick marks on the axis. I don’t need all that information.
  • The title, “Female hours worked relatively low” doesn’t make much sense. I don’t mind using abbreviated language in titles, but this one seems to have gone too far.

 

MakeoverMonday: US Tuition Fees

Hi gang. Unfortunately, this week has seen time get away from me. This week in the UK is a Bank Holiday. You don’t need to cry me a river, but after three days away, a long drive home, and fatigue setting in, I’ve not had time to do a finished Makeover today. I’ve also spent a good long time catching up on this week’s #MakeoverMonday tweets – what an amazing effort everyone’s put in this week!

I did try out some ideas.

I tried a table
I tried a table

I spend ages trying to create a table using panel chart techniques, but I couldn’t find a way to make it right. I submit the above for this week’s Makeover, even though it simply did not work. There are some other ideas below. Or go check out my workbook.

This week’s original (click here)

This week’s original had a mix of good and bad.

The good:

  • I like the idea of mixing map and bar chart. You get the geography and a precise way to compare states.
  • The bar chart is sorted, so you can easily find the biggest and the smallest tuition fees.
  • The map has Hawaii and Alaska included, inside a compact space.

What I’d improve:

  • It’s crying out for interactivity! I want to hover over this and see some details, and link the map to the bar.
  • The colour scheme is really saturated. It’s a bit too overwhelming, as if the whole thing HAS BEEN DESIGNED IN CAPS LOCK. I don’t know where to look first.
  • I do not like vertically oriented labels on the bar chart. If you want to label the bars, orient the bar chart vertically, too, and put it on the right-hand side of the map, not below it.

My other approaches included parameter driven highlighting, a butt-ugly small multiple state map and a small multiple area chart. They all had potential, but I’m out of time this week.

Traditional bar and percent change

 

A table with a map background