One of the biggest concerns, even for me, with giving everyone access to data, is Shadow IT. Everyone has their own copies of data, they use different terminology for the same dimension or measure, and it becomes an overall data governance nightmare. With Tableau Data Server, that nightmare goes away and you can sleep better at night. At least, I do.

Let's talk about how Tableau Data Server provides data governance, including data quality.

Data governance

Back to our story from the beginning. How does Tableau Data Server change the conversation knowing now that we can have curated and certified data sets?

Analyst: 'I worked with our team to get that data available on Tableau Server. We don't have to worry about looking at stale data anymore because it is refreshed every day! Even better, accounting stopped using those manual spreadsheets so we're sourcing their information from the same place.'

Manager: 'So you mean we won't have different numbers anymore? That's great.'

Publishing a data source on Tableau Server provides consistency across everyone accessing that data set. The dimensions and measures are curated, defined, and described for all to see. Tableau Server even allows us to mark such data sources as certified. The pitfalls you can run into with published data sources happen when you don't do these things. If you publish a data source without descriptions, provide random or nonsensical measure/dimension names, and/or you don't have a data workflow process in place, you risk causing confusion, mistrust in the data, and an overall nightmare for data management. I've experienced this nightmare. We set up published data sources without any descriptions and that caused confusion to the end users.

How did we get around that? We created a workflow process for creating data sources on Tableau Server. This could be as lean or as heavy as you want. But, in my experience, it should meet at least the following:

  1. Dimensions and measures are well defined to your company's business language.
  2. Calculations are named appropriately and include necessary comments. (You don't want to have 'total' as one calculation and another as 'total total.')
  3. Dimensions include descriptions if the name alone is not sufficient. You can include the source of the data from upstream applications like a website form or order form.
  4. Mark your data sources as certified once you've completed these steps. It signifies to users that this data can be trusted.

Work with your team and your CoE to define what this workflow means for your group. This takes some time to integrate, but understand that this helps overall understanding of the data on Tableau Server.

Data Quality

Part of data governance is data quality. How do you ensure that the data you have is correct? In the story, the analyst mentioned that with the data being on Tableau Server, that data was going to be refreshed daily. Tableau Data Server allows you to define schedules for your extracts at various cadences, including hourly.

A potential pitfall with scheduling a data source is missing data if the data is not available at the time the extract runs. We can take data quality validation one step further by creating a dashboard that queries your Published Data Source and your original data source to compare the total number of records. With Data-Driven Alerts, you can get notified if the data source is out of sync. This is something that I use daily for some of the more critical data sources.

Alternatively, the Tableau Server REST API and Tableau Data Extract Command-Line Utility allows developers to create a 'push job' that will refresh data on Tableau Server when data is available on the original data source. Instead of Tableau Server pulling the data from the original database on its own schedule, the push job (outside of the Tableau Server schedule) is dependent on data being populated in the source database before running the refresh data extract job for Tableau Server. This approach works only if you have access to a scheduling program. Work with your development team or data team responsible for loading data into the database to see how you can add this job.

Attachments

  • Original document
  • Permalink

Disclaimer

Tableau Software Inc. published this content on 11 July 2018 and is solely responsible for the information contained herein. Distributed by Public, unedited and unaltered, on 11 July 2018 15:58:09 UTC