Monday 10 February 2014

How a Rubber Duck can improve your Tableau Viz


Now I know what you are thinking. He's gone mad!! What on earth does a bright yellow aquatic bird made of rubber possibly got to do with Tableau and how is that supposed to improve my viz? Well just bare with me and it will all become clear.

I've been rubber ducking at work for sometime now, in an open plan office so its not as dodgy as it might sound. Its a process that has been common in software engineering and I have started using it with my Tableau work and its been a great success. So how does it work with coding?

OK, tell me about the Rubber Duck

Well, imagine you are writing a piece of code to extract some data from a database, perform some action on it and display the results to a screen. You've written the code how you think it should work but when you come to run it you get errors, or not the result you expected. You've been working on this code for the last week so know it inside out, but because of the familiarity of it you cannot see what the bug is. I have a buddy in the office that i turn to in situations like this. I get him to sit down with me and i talk through the code, explaining what its supposed to be doing, why I used this approach rather than that. And without fail, just but explaining it out  loud I discover the bug, correct it and now the script works perfectly.

And the Rubber duck?

Simple, sometimes there's no-one in the office or you are working from home, or maybe you always code by yourself. The story goes that a software engineer started explaining how his code worked line by line to a rubber duck that happened to be on his desk and the Rubber Duck debugging method was born. 

Its the simple process of talking through the code that does the trick. I know the code inside and out and know what its supposed to do. By having to actually explain what its doing forces you to actually read the code you have written and observe what it does do.  My buddy didn't do anything other than listen, they don't have to know how to code or what i am trying to do, they just have to listen and let my brain do its thing. 



How does this apply to Tableau?

Designing and building a good viz takes some time. You spend a long time looking at the data and working out how best to display it. By the time it comes to publish the viz you know it inside and out and you know exactly the story that you are trying to convey. You know why you have used each of the filters and how you expect the user to interact with it. And therein lies the problem. You know all this, and assuming that the person that is looking at your viz for the time will also know this just isnt always going to be the case. You get to the point where you can no longer look at the viz objectively, you cannot unlearn how its supposed to work. This is where the rubber duck technique can really help you out. Sitting down with someone and explaining to them the logic behind your viz will show up whether you have conveyed the story that you want to. Showing them how each filter works will highlight those that are really not needed or those that make no sense. Talking through how the viz is supposed to work and seeing what it actually does makes it apparent where the issues are. 

If you work with other people in an office ask someone to sit with you while you go through your dashboard. State what its supposed to show, how the filters let you change the view and how that helps find the story in the data. If you work alone then there are many people in the Tableau community that you can ask to look over your viz for you. A fresh pair of eyes will soon spot glaring problems and if you give them a brief outline of what the viz is supposed to convey, then they can tell you if you are on track.


And if not, get yourself a little rubber duck, sit it on your desk and explain to it what your viz is supposed to work. If you struggle to explain your design then the duck has done its job. 

And as ever, finish with a song, take it away Ernie...




Tuesday 4 February 2014

Boost your Viz performance with Extracts

Why Extracts are great, and why you should use them

   So I am going to lay my stall out early, i love extracts. I use them as much as possible. At work all of the workbooks i create and publish to our Tableau server use extracts. Well that's not entirely true, out of 127 views, I think 6 might look at a live data table. So, you might be asking, why does he do that? Well first, lets look at what data extracts are.

What is a data extract?



   A Tableau data extract, lets call them TDE is a cache of data stored locally to your tableau instance (desktop or server) that is optimised for fast queries by Tableau. It tends to be a sub-set of data that also allows you to query the data offline. In fact if you publish anything to Tableau Public, you have already made an TDE. 

OK I get that but why use them?

   Speed. Whenever Tableau updates a view or dashboard it has to go and query some data somewhere, apply some filters possibly and then bring back those results and display then for you in whatever style you have chosen. If your data source is a nicely structured datawarehouse running on some nice fancy platform like Terradata or Vertcia then you should get sub second responses and the views update pretty quickly. However sometimes you don't have that, maybe your data connection is slow and its feels a little sluggish to respond. Or, in the case with much of what i do, you are connecting to a database that is used as a transactional one, not designed for easy reporting. You connect to this using some custom sql that may take minutes to bring back the data. Clearly you don't want to wait a minute every time you change a filter.
 
Nothing looks worse to the end user than a sluggish interactive experience. You could have the best looking dashboard, clever Jedi calculated fields, carefully chosen colours and all the charts follow best practise. But, none of that matters to the viewer if when they try to interact nothing seems to happen for seconds. That will be the only thing they notice and remember.

Using an TDE ensures that the data source that Tableau is using is going to provide the fastest user experience possible, which after all what we really want to happen.

But you are looking at stale data, isn't that an issue?

   Not in my experience no. The number of times where someone really really needs to look at totally live data is in fact very small. Mostly people are looking for trends over the last month or year, so the fact that the latest data might be a few hours old isn't going to change the results. If the underlying data isn't changing that often, say twice a day, then you only need to refresh your data twice a day to keep up with it. 

An extract can be set up to be refreshed according to a schedule that you can set, eg daily, hourly, weekly, every 15 minutes. The period that you decide on depends on a couple of factors 
  • How often is new data arriving to your data source that the extract is built on? If its hourly then the extract should be done every 2 hours to ensure it gets the updated data. There is no point having extracts just refreshing when there is no new data. If your warehouse gets built overnight, then refresh once a day, when you know that the warehouse build is complete. 
  • Some extracts can take a long time to build, so you need to make sure that the time between extracts is not so short that it cannot update in time. 
So how do we make an extract?

  • First connect to your datasource 
  • Right click on the name of the datasource and select Extract Data

  • From the next dialogue we can extract just a subset of data based on a filter, aggregate values or only get a number of rows. For our purposes we want everything so click extract

  • We now give the extract a name and save it in our Datasource folder.

  • Tableau now connects to the datasource, grabs all the data and packages up into a TDE for us and saves it to disk. 
  • Now if we look at the data connection we see that the icon has changed to denote we are using an extract. 

  • Now, when we publish the workbook it Tableau will also upload the extract along side the workbook and that will be used for the queries. 
  • Before we upload we need to tell Tableau Server how often to refresh so click on Scheduling & Authentication 

  • Change the  authentication to embedded password if you are connecting to a database.

  • Select your refresh rate and click ok and then publish.

  • You now have a workbook linked to the TDE on Tableau server .
Here's two Jedi tips to make it even better.

Show what database you are connected to and how old the data is

As we are now using an extract its good to tell people how old the data is in the dashboard they are using so if we set the title of our worksheet, not the dashboard title to this.

Now when we view the worksheet we see this in the title

We can now see what datasource we are connected to and when it was last updated. This also works for live connections and i would encourage you to use it everywhere. 

Publish your extract to minimise server overheads. 

If you have two workbooks, lets call them A and B and the both connect to the same TDE and we publish them both then we have two copies of the TDE, say TDE1 and TDE2 on the server which both need to be refreshed ie

A-->TDE1             B-->TDE2

This is a waste of resources and also makes updating dashboards hard as if we want to modify the TDE we have to do it for both copies.

Instead we can publish the TDE to the server and then connect to it like it was any other data source. So that we then have one TDE to refresh and maintain.  ie

A--> TDE <--b class="goog-spellcheck-word" data-="" span="" style="background-color: yellow; background-position: initial initial; background-repeat: initial initial;">blogger
-escaped-div="">
           
  • To do this right click on the extract you want to publish and we get this dialog

  • Select the refresh period we want to use and the authentication to Embed Password and click publish
  • Now when we connect to a new datasource and select Tableau Server we see it in the list of available sources. 
  • Note the icon now changes to a little Tableau logo to tell us we are using an Extract on the Tableau Server. 


Now anyone accessing server can use the same extract, that will refresh its self, be the fastest connection to the data and is easily maintainable. 

Heres a little reminder of when you should use a TDE as opposed to connection live to the date

ScenarioConnect LiveUse Extract
Need up to the second reportingX
Data Is updated hourly, daily etcX
Data source is slow, takes seconds to updateX
Using a fast data warehouseX
Don't need instant updatesX

Hopefully this blog post has shown you why you should use extracts as much as possible with you own Tableau dashboards. Extracts are a great way to optimise the end user experience, which is a vital part of our work. A sluggish dashboard isn't going to make you look good, a snappy fast one will. Extracts allow you to gets around the problem of a slow datasource or a less than optimised database. Publishing extracts to Tableau server makes sharing a curated dataset amongst your users really easy. So you know what everyone is looking at the same few datasources so you know the data is consistent across the sites dashboards.

 
biz.