Basic Business Reporting with Tableau
Case Study on Credit Risk Data
If you work in Analytics, you have probably heard about Tableau – one of the best data visualisation and reporting tools. It repeatedly ranks as a leader in Gartner’s Magic quadrant in Business Intelligence and Analytics. Tableau is in a huge demand now, but still relatively easy to learn (compared to other trendy data skills, such as machine learning, R, statistics, Ruby, Python, Hadoop or even SQL), so you can quickly increase your job market value by learning it now!
In this article, I will suggest how you can explore and learn Tableau as a tool for business reporting on a credit risk data example. I assume that you have some basic experience with Excel pivots. Some basic knowledge of databases is also helpful, but not required. It will not be a step-by-step training, but rather an invitation to spend one afternoon exploring the new tool by yourself. This way you will not only learn faster and in more interesting way, but you can also include the final work in your portfolio as it will be your own creative work.
If you want to try Tableau for free (and your university or employer does not provide it yet), you have two possibilities: either 14 days trial of Tableau Desktop or a free version called Tableau Public.
Public version offers almost the same what the regular version does with one big catch: Everything you save you save publicly (hence the name) on Tableau server, where it is accessible for everyone, including the source data. It also limits the data source connections (you cannot connect to database directly), however this can be simply solved by exporting the data to CSV. I feel that providing the public version was extremely smart business move from Tableau as it brings more users in while still keeping the paying customers in.
The choice is yours: If I wanted to test on a real data and in real enterprise environment (=connection to databases, automation), I would go for trial. In other cases, Public seems to be a better fit: you can learn anytime you want, share the visualisations you created easily and you will have an outstanding tool for analysing the public(able) data anytime in the future. Be really careful with what you share though, even hiding visualisation from your profile does not mean it is private! (We will provide dummy data for this exercise, don’t worry!)
The main strength of Tableau lies in its visualisation abilities. 24 types of charts are available (see picture below), but on the internet you can find tips how to build completely differently looking visualisations (mostly with scatter charts + calculated fields).
Map visualisations are very easy using built-in tools by Tableau, however the most advanced functions (such as finding a latitude for a city or state automatically) are quite limited outside the USA.
Tableau can also connect to multiple data sources (databases in case of non-public versions) and do joins between them. This way it can be used as a very simple data integrating platform if there is no better choice. Personally I still prefer preparing data outside of tableau (pl-sql procedures or virtually any ETL tool) to a denormalised grouped table for better control over the process and less data transferred between systems.
Most of the online examples and tutorials are built on some kind of public data, far from business reality. Sure, you can learn Tableau on data about pet-owning households (https://public.tableau.com/en-us/s/gallery/cats-vs-dogs), but if you want to use it for a business dashboard creation, it will be better to use a realistic business data. If you use Tableau Public, do NOT use real data from your company! We have generated for you sample data from a credit risk domain, which should be enough to create a simple antifraud dashboard.
Quick introduction to Anti-fraud business and indicators
If you already work in a bank or a retail loan company, you can skip this part. For others, there is a quick basic explanation of antifraud department in a financial company:
Purpose of the Antifraud department is to monitor, investigate and prevent an organized fraud (taking loans with a purpose of not repaying it) on a company. Their main tool is monitoring of risk concentration.
For that, Antifraud department selects a risk indicator (ideally some which is detectable quickly – see further) and tries to find a concentration of it on a particular category of clients. Category can be almost anything: same time the loan was provided, locality, sales agent (or maybe his/her supervisor or even supervisor of his/her supervisor), company where the clients work, in one case even the customers' surname worked. Uneven concentration of risk indicator creates suspicion, which is then investigated further. (for example if one sales agent has significantly worse risk than others, investigators will go through all of his contracts in detail and try to check if there is any fraudulent behavior involved.
I have mentioned quickly detectable risk indicator: for most common frauds it is enough to see whether the customer have paid the first installment. Most of the companies use FPD rates (First Payment Default), which are calculated as percentage of contracts which have reached several days past due on first installment, divided by those who could have reached it (observable). Usually FPD is monitored on several DPDs (in the example 5,10,30) as shorter allow for faster action, while longer are more precise. Similarly to FPD, many companies monitor even further installments (SPD, TPD, ...) as fraudsters may decide to pay the first one to prevent detection.
As a company gets bigger, more advanced frauds are aimed at it, creating the need for more advanced analytic tools such as machine learning. But some simple concentration monitoring should be present in every company from the beginning.
Variables used in the dataset are described in the table below:
Go and Explore!
Now you can download the sample data and start building a dashboard on it! I recommend starting your own work and don’t get biased by the way we have created the dashboard, there are so many possibilities to approach it.
Have you created something interesting? Found the fraudster?
If you get stuck, you can check the dashboard we have created or read few tips below it.
For inspiration we have created a simple dashboard. You can see it below, however it gets a bit distorted on this page, so I recommend browsing the original antifraud report on our Tableau Public.
Here are some tips which you may find helpful when creating the reports:
Connecting the data
Tableau automatically classifies the variable type (text strings, numbers, dates), including the geographic roles (Latitude/Longitude). I recommend to go through the columns on “Data Source” tab and correct the data types if needed. It is also possible to rename the variables here.
Calculated fields are useful for defining metrics based on another columns. You can create a calculated field by right-clicking on a measure and selecting Create → Calculated Field.
They work similarly to excel, but there is one exception: In tableau, calculated fields are by default first calculated on a row level and then aggregated. In excel it is the other way. This seems to be a detail, but with a grouped table such as ours the results will not be accurate (for example for weekly FPD ratios you will have an average of daily ones, whereas the ideal measure is sum of weekly FPD#/sum of weekly observable # or weighted average). To make Tableau calculate the ratios correctly on aggregated values, calculate the values as sum(metric1)/sum(metric2) instead of just metric1/metric2.
Bar and line charts aligned
When you are combining line and bar chart in the same chart, you may notice a "strange" behavior, when bars seem to be shifted by one column. To fix this, change the time dimension (e.g. month) to discrete. Complete explanation: http://drawingwithnumbers.artisart.org/bars-and-lines/
Limiting only most recent data
In some cases, we may want to see only the most recent data (see daily part of our example – if there would be all the 6 months data we have, it would look congested and daily trends would be hard to spot). For limiting the data for last 60 days only, you can either filter by relative date or create a filter like this:
You can play with it a bit to include only whole calendar months etc.
Tableau uses three levels of visualizations:
Worksheet: usually for a single chart or table (or multiple charts with the same axis)
Dashboard: Few charts on a page. Tableau by default use quite small page size, enough for about 3 charts maximum. This is quite different concept of dashboard than what we know from other reporting tools. I recommend keeping the default size, it is better for mobile/tablet and also more natural for people to concentrate on. For presenting more complex information use stories, ideally with links.
Story: a presentation consisting of different dashboards and worksheets. Similar to Workbook in Excel.
I have three tips to work with the dashboards:
Note that if you change the formatting of the chart in a dashboard or story, it changes even in the underlying worksheet and all the related dashboards and stories. This may completely destroy your report design.
Keep some space for the comments on a dashboard (Text Object).
You can save some space on a worksheet by making legends and filters floating above the chart instead of tiled (try pressing Shift while dragging the object).
During the exercise, you have learned some basics of data visualization with Tableau. I hope you have enjoyed working with this tool. To further improve your skills, use Tableau to visualize some other kind of data (tip: try making some interesting visualization over Eurostat open data), get inspired by others (start with Tableau Public's Viz of the day) and learn related tools and techniques (good complement is SQL for data preparation and cleansing). Stay tuned for more related articles!