Building a data dictionary with Tableau's Metadata API
TL;DR: In this post you will learn how to build a data dictionary of all the calculated fields for all the workbooks published on your Tableau Server or Tableau Online using Tableau’s Metadata API.
In one of my previous posts I wrote about Tableau’s Metadata API and some simple queries to get started with it. Ad-hoc queries are useful to get quick answers about data sources and workbooks on your Tableau Server. As a next step, I wanted to explore how to use the Metadata API to build tools that help teams across an organisation understand their data assets better.
While working as a consultant on various data projects, I noticed a common challenge: teams within the company lack clarity around how some key metrics are defined and calculated in their workbooks, or different teams may have slightly different definitions of the same metric. This can significantly skew any analysis done based on such metrics. To avoid such situations, companies should design an internal data dictionary.
There are different interpretations of what a ‘data dictionary’ is. In general it can be described as a list of definitions for key metrics and data fields that are used across the company, including the ways to calculate them, where applicable. The main idea behind such a dictionary is to eliminate confusion and ambiguity, and make sure that all metrics are understood and calculated in the same consistent way. The dictionary should be made available to everyone in the company, and become the first point of reference for any analysis. I recommend reading this post by Carl Anderson to learn more about some best practices for creating a data dictionary.
In this post I’m going to talk about how to create a data dictionary using Tableau’s Metadata API. What follows below is a detailed walkthrough of this topic that I presented at Phoenix Tableau User Group in October 2020 (slides, recording of the webinar).
Accessing the Metadata API
The Metadata API is available to users of both Tableau Server and Tableau Online. It is enabled by default on Tableau Online, but if you are using Tableau Server your server admin should enable it first by using the tsm maintenance metadata-services enable
command through the Tableau Services Manager (TSM).
Once the Metadata API is enabled, there are two ways to access it:
-
Direct URL. Append
/metadata/graphiql
to your Tableau Server or Tableau Online URL as follows:https://<your-tableau-server>/metadata/graphiql
-
Using the Data Management Add-on. If your Tableau Server or Tableau Online has the Data Management Add-on enabled, you can access the Metadata API by going to the External Assets section in the server’s menu on the left, and clicking the Query metadata (GraphiQL) link.
Whichever way you choose, you will land on GraphiQL, Tableau’s GUI for interacting directly with the Metadata API.
For the purposes of my project, I was connecting to the Metadata API of my personal Tableau Online instance, part of the Tableau Developer programme. If you haven’t already, you can sign-up for Tableau’s Developer programme here.
Quering the Metadata API using GraphQL (query language) in GraphiQL (Tableau’s Metadata GUI)
To query the Metadata API you will use GraphQL, an open-source data query and manipulation language for APIs (not to be confused with Tableau’s GraphiQL in-browser GUI described above). If you haven’t worked with this type of APIs, please refer to my previous blog for more details on how to structure your GraphQL queries. The Metadata API’s documentation has a detailed model of objects and their attributes, and you can also search the documentation embedded in the GraphiQL GUI for a particular object.
In one of my previous projects I worked with Tableau workbooks built by different people, and there were many calculated fields that were either duplicated or not used at all. I spent a lot of time trying to understand what each calculated field did, which fields from the original data set it referenced, and which dashboard it was used in, if any.
So, when I discovered the Metadata API, I wanted to see if I could speed up such investigation, and create a data dictionary of all the calculated fields for all the workbooks on my Tableau Online. In the GraphQL query below I focused on all calculated fields, which fields from which data sources are used in these calculations, as well as any descriptions or comments for such calculated fields.
Here is the final query for this project:
query calculations {
calculatedFields {
name
description
dataType
formula
referencedByCalculations {
name
formula
}
upstreamDatabases {
name
}
downstreamSheets {
name
}
downstreamDashboards{
name
}
downstreamWorkbooks {
name
projectName
tags {
name
}
upstreamDataQualityWarnings {
warningType
author {
username
}
}
}
}
}
My query is structured as follows:
- The main object of the query is
calculatedFields
, which will return every calculated field from the workbooks on my Tableau Online - Each calculated field has several attributes that I’m interested in:
- name
- description
- data type
- formula
- I also checked whether a calculated field is used in another calculated field by calling the
referencedByCalculations
object, the name of such calculation, and its formula - The query should also return the names of data sources that contain fields which are used in calculated fields by using
upstreamDatabases
object - Next, I found out which worksheets (
downstreamSheets
), dashboards (downstreamDashboards
), and workbooks (downstreamWorkbooks
) each calculated field is used in, including a project name and a relevant tag for every workbook - The last thing was to get any data quality warnings attached to an upstream data source that a calculated field is related to (
upstreamDataQualityWarnings
)
Visualising the response in Tableau Desktop
The response from the API comes in JSON format that can be then visualised in Tableau Desktop to make it more user-friendly. For this project, I simply copied the query response as it was displayed in GraphiQL onto a text editor and saved it as a JSON file.
As an aside, you can also automate metadata extraction using Alteryx, Tableau Prep Builder, or programming languages such as Python or R to make sure that your data dictionary stays up to date. In that case you’d need to use Tableau’s REST API for authentication first. You can learn more about how to access your Tableau Server using a Python script in one of my previous posts.
Once I had the JSON file, I selected it as my data source in Tableau Desktop, and designed my data dictionary using a simple tabular format. You can see the final dashboard on my Tableau Public.
I implemented the following features in this dashboard to make it easier to explore the metadata:
- There is an indicator of when the data was last updated in the top left corner
- The main table lists calculated fields grouped by the workbook they belong to
- A user can get more information about a calculated field by hovering over the information icon on the right. If this is a red diamond icon instead, the data source that the field is coming from has data warnings. The tooltip also provides data on other calculated fields that reference the selected calculation
- There are filters on the right side of the dashboard to look up fields by project, workbook, tag, datasource, or even by the calculated field’s name. Using the ‘Calculation Name’ wildcard filter also allows end users to check whether field naming best practices are met (for example, no fields are named ‘Calculation1’ or ‘Sales(copy)’)
I hope this project gives you a better idea of the value that you and your orgnisation can derive from Tableau’s Metadata API. Of course, this is just one example of a data dictionary. You can modify the query to extract the metadata that is most relevant for your use case. Let me know if you have any questions.