If you're a long-time Tableau user, you know this problem! You want to temporarily pull the chord on a workbook or a datasource and its extract refresh tasks. Maybe an ETL pipeline failed. Maybe you don't want to see last week's 40% drop in the price of Bitcoin. Whatever the reason, there are extract refresh tasks that need pausing.
While you can pause and unpause entire schedules very easily, there is no out-of-the-box functionality for workbooks and extracts. Additionally, there's the tricky issue of upstream datasources for workbooks. Are you a Tableau API whisperer? You're going to need to be one if you aim to weave them together to identify all of the relevant extract refresh tasks in order to process them accordingly.
What it does
This project provides Python code that makes pausing a workbook or datasource as easy as pointing to it by name. You can point to it by its local unique identifier (luid) as well, but if a name is more convenient for you then go for it.
Our process uses a combination of the REST API and the Metadata API to identify the extract refresh tasks associated with your workbook or datasource and pause or unpause them. As an aside, this project also supports pausing and unpausing (suspending and activating) schedules by name or local unique identifier (luid). Schedules can only be paused and unpaused if you are in a Tableau Server environment (that functionality is unavailable in Tableau Online).
Because there is no REST API endpoint to pause or unpause an extract refresh task, our process becomes a bit of a seek and destroy mission. We sniff out all of the extract tasks related to the target workbook (plus its upstream datasources) or datasource, and then we delete those tasks. But before we delete the tasks, we log some key details about the tasks so that we can rebuild them later on when we want to "unpause" the tasks.
How I built it
This project is built in Python. I make use of the tableau-api-lib library to interact with Tableau's REST API and Metadata API.
This project centers on a Python class named "ExtractFrefreshTaskManager", which houses the core logic for pausing and unpausing extract refresh tasks. When you define an instance of this class you provide it an active connection to Tableau Server (or Tableau Online), which in this case is an instance of tableau-api-lib's TableauServerConnection class.
The extract manager then uses the Tableau connection to do its magic by issuing the right mix of REST API and Metadata API requests to pause and unpause tasks.
If a workbook is being paused, then the extract manager identifies the extract refresh tasks directly related to that workbook. These are typically embedded extracts. However, we also need to know about the upstream datasources feeding into the workbook. To get that information, we lean on the Metadata API which makes it easy to see a workbook's upstream datasources. We also learn which of those upstream datasources have extracts, and whether or not they run on a schedule.
Equipped with all of the individual tasks that we need to pause, we now use the REST API to delete those tasks after logging the task details to a CSV file (you can use any storage medium you prefer).
To unpause our tasks, we point to the workbook or datasource that we want to unpause and then our process looks up the "paused" tasks for that item. We then cycle through all of the paused (deleted) tasks and we unpause (re-create) them.
Challenges I ran into
The biggest challenge in this project was making the process functional for pausing and unpausing workbooks that have upstream datasources. The high-level logic is one thing, but once you begin writing the logic you reach a fork in the road where organizing your code is as important as writing more.
For me the one unavoidable challenge that exists in every project is how to write code in a way that you quickly understand what has already been written. You can shoot from the hip and write all of your functions in a single file, but then when you come back to that code a week later you're in for a wild ride trying to understand all the clever hoops you were jumping through when you were sprinting by without leaving any breadcrumbs for your future self.
To overcome this challenge, I took the approach of placing all of my logic into bite-size functions. Each function is documented and uses type hints to clearly illustrate what that function expects as input, and what it will provide as output. This process was instrumental in building the logic in a way that was easy to build upon day after day.
Accomplishments that I'm proud of
I'm glad a few people are getting some value out of my blog posts on Medium (Devyx) and my YouTube channel (Devyx), where follow-up questions naturally led into this hackathon project.
My goal is to fuel people with ideas for automating their own workflows in the Tableau ecosystem, and this project will help people get a foot in the door in terms of understanding how they can use Tableau's APIs to enrich their team's usage of Tableau.
What I learned
In building this project I learned more about the structure of data returned by Tableau's REST API and Metadata API, and how to reshape that data into other formats that best serve my needs.
What's next for Pausing and Unpausing Extract Refresh Tasks
An improvement to this project would be to add an optional functionality that supports emailing users when content is paused or unpaused. This would be useful for alerting an entire user base (users having a subscription downstream of a paused workbook or datasource) or a small group of developers (users in a specific group), depending on the team's needs.