This project was inspired once our team realized the potential dangers of using the split and pivot method in Tableau Prep, especially on messy data or commonly refreshed data. We noticed that there is a hard coded max number of values in the split function in Tableau Prep, meaning that whatever number of splits was required for the original dataset will be used when you refresh your data. This causes for the values after that hard coded max to stay combined into one cell, causing data loss when we go to join our data at different granularities, as showed in the example video included.
What it does
This project utilizes Tableau's Analytics Extension API's Rserve and TabPy. There is a Python script and an R script that are equivalent, so it is up to the user to choose whichever interface they feel most comfortable with. By adding either one of these scripts into your flow, you can quickly and reliably break all multiple value cells into single, vertical cells. We put emphasis on reliably because this method solves the silent data loss problem that occurred when replicating the split and pivot method as outlined by the Flerlage Twins in one of their blog articles, here.
How we built it
There is a Python and an R version of this script. The Python script is deployed via TabPy and utilizes Pandas' explode() function. The R script is deployed via Rserve and it takes advantage of the cSplit() function in the splitstackshape package. Both scripts work the same way, so it is up to the users' comfortability with each language when it comes to deciding between the two.
Challenges we ran into
The most challenging part of this project was figuring out a workaround for the fact that Tableau Prep does not allow argument inputs. Finally, we came to the conclusion that the easiest method would be to use regular expressions and ask the user to include key words in their column names rather than accepting an argument, which would have been the preferred method
Accomplishments that we're proud of
We are proud that there is now a reliable method to split multiple delimiter separated values into rows in Tableau Prep. This is a huge win for analyzing each value and finding out key answers in our data. It's also a big plus that we can do this automate this flow without babysitting it for errors, giving us more time to focus on answering the important questions within the data.
What we learned
We learned how powerful connecting TabPy and Rserve can be in Tableau Prep. We can now successfully wrangle data in many new methods.
What's next for Cell Splitter
We hope to eventually see an option in Tableau Prep to allow users to pass arguments, ie column names, eliminating the extra steps of naming and renaming the columns to be "values" and include the word "key." We have been trying to draw attention to this idea so it can gain some traction once people see how powerful it could be.