Introduction to Tableau's Metadata API
During my first client placement as a part of The Data School programme, I frequently worked with Tableau dashboards that were built by other consultants. In these situations I needed to quickly understand which fields from the data source are used and how calculated fields are structured. It was important to understand which charts and calculations might be affected when I needed to change a certain calculation. I knew there was a better way than just manually checking all the calculated fields in a workbook. So, during our recent Back to School week at The Data School, I chose Tableau’s Metadata API as my learning topic, and explored what it is and how to query it.
What is Tableau’s Metadata API?
Tableau’s Metadata API enables a user to explore data assets and content on Tableau Server or Tableau Online, and query the corresponding metadata that have been indexed by Tableau Catalogue (part of the Data Management Add-on). Tableau Server has always stored metadata about its workbooks, data sources, and other assets, but starting from Tableau 2019.3 end users are able to query the stored metadata directly to:
- Discover data associated with the content published on your Tableau Online site or Tableau Server (tables, databases, and data sources)
- Track lineage, i.e. the relationships between assets, like data sources and workbooks
- Perform impact analysis. Using upstream and downstream lineage information, you can evaluate the impact of changes to content. For example, you can see which calculations or sheets in a workbook might be affected if a column is removed from a data source.
There are several potential use cases where the Metadata API can be really helpful, for example:
- Creating an internal data dictionary or catalogue of data sources
- Ensuring that calculations are named properly and no duplicated calculations are created
- Preparing handover documentation for a consulting project, or analysing existing content when starting a project.
Using GraphQL language to query the Metadata API
To query the Metadata API you need to use GraphQL, an open-source data query and manipulation language for APIs. The language was developed internally by Facebook in 2012 before being publicly released in 2015. Instead of sending multiple calls to different endpoints of an API, like you would do using a REST API, you need to send one focused query in GraphQL. That allows faster execution, cleaner code, and smaller response payloads. I would recommend starting with this short video overview of GraphQL by The Pragmatic Studio to learn more about the language and how it’s different from REST APIs.
Accessing Metadata API
By default, the Metadata API is available for users of both Tableau Server and Tableau Online. However, if you are using Tableau Server it should be enabled first by using the tsm maintenance metadata-services enable
command through the Tableau Services Manager(TSM).
Once the Metadata API is enabled, you can access it by modifying the link to your Tableau Server or Tableau Online as follows:
https://<your-tableau-server>/metadata/graphiql
If your Tableau Server or Tableau Online has the Data Management Add-on enabled, you can access the Metadata API by going to the External Assets section in the server’s menu on the left, and clicking the Query metadata (*GraphiQL*) link.
It is also possible to query the Metadata API using Alteryx or a Python script, but in this case you’d need to use Tableau’s REST API for authentication first. You can learn more about how to access your Tableau Server using a Python script in one of my previous posts.
Now that we understand what the Metadata API is and how we can access it, let’s look at several practical examples to see how it works.
For the purposes of my project, I was connecting to the Metadata API of my personal Tableau Online instance via the Tableau Developer programme. If you haven’t already, you can sign-up for Tableau’s Developer programme here. Otherwise, you can get some hands-on experience with the Metadata API by interacting with the demo server set up by Tableau.
Accessing the Metadata API using GraphiQL
If you don’t have any prior experience with GraphQL, I’d recommend starting with GraphiQL, a graphical user interface (GUI) available in Tableau Server or Tableau Online. It’s easy to use, has embedded documentation for the Metadata API and autocomplete functionality. GraphiQL is accessible through https://<your-tableau-server>/metadata/graphiql
. It does require you to be authenticated, as you are accessing the contents of your Tableau environment.
The image below highlights the main areas of the interface:
GraphQL query syntax
Before we start querying the API, we need to understand which elements should be included in a query.
As you can see from the sample code below, we need to name a query, specify which object we would like to access (for example, ‘tableauUsers’), and the relevant attributes of this object (for example, ‘username’). Some objects can at the same time be attributes of other objects (for example, workbooks owned by users). In this case we need to nest the ‘ownedWorkbooks’ object attributes in a new pair of curly brackets.
query "query-name"{
<object> (<arguments>){
<attribute>
<attribute>{
<attribute>
}
}
}
You can also add arguments to objects, such as ‘filter’ or ‘orderBy’ to narrow down or order the response to your query. We will look at how to use arguments in one of the examples below.
The query returns only the data you specify in the same shape as your query, in JSON format.
The Metadata API’s documentation has a detailed model of objects and their attributes. You can also search the documentation embedded in GraphiQL for a particular object.
Let’s look at three practical examples of GraphQL queries.
Get information about users on Tableau Server and their workbooks
This query will return information about all users on our server (in my case, I’m the only user on my Tableau Online), their id, username, as well as information about workbooks they own, such as their ids, names, project they belong to, and creation date.
As you can see, GraphiQL makes it easier for you to write queries by suggesting attributes that can be included.
But what if we want to see only the workbooks that belong to a certain project? In this case we need to add an argument ‘filter’ to the ‘ownedWorkbooks’ object to narrow our query down to just this project. In my case, it’s the project called B2S_Metadata_API.
Explore calculated fields in a workbook
Imagine you just took over the B2S_Metadata_API project on the server, and you need to understand which data sources the workbooks are connected to and which fields are used in the calculated fields. You can do it manually, or you can query the metadata and see all this information at once.
For example, let’s learn more about a workbook in this project called Superstore map.
And here you have it: the list of all original fields from the data source and calculated fields with their corresponding formulas that are present in this workbook! This is a great starting point to understand how the original fields from the data source are used in this particular workbook, if there are any duplicated calculations, and if there are calculated fields that don’t follow correct naming conventions (I’m looking at you, Calculation1 field!).
Get a custom SQL query used in a workbook
Several times during my recent client project I needed to get the custom SQL query used in a Tableau workbook, and this process turned out to be not as straightforward as I expected. So when I started looking into the Metadata API, one of the things I wanted to understand is whether it’s possible to get the custom SQL query that is used in a workbook. Turned out it can be done with just a short query as below.
You will have noticed that for these examples I wrote 3 independent queries in the same GraphiQL session. GraphiQL’s interface allows you to run each query separately by using a menu that appears when you press the Play button at the top.
I hope that this brief introduction to the Metadata API will inspire you to learn how you can use it to understand your server’s users, content, and assets. In my next posts I will explore how to structure more complex GraphQL queries as well as how to connect to the Metadata API programmatically using Alteryx or Python. Stay tuned and let me know if you have any questions in the meantime!