- The amount of data is surging (~90% of the data in the world has been created in the last 2 years)
- To ensure that data in organizations is useful, it needs to be found easily
- A great “Enterprise-y” solution has been metadata tagging!
- However, users generally hate doing that manually
- Automatic solutions are either cumbersome to maintain, expensive to develop, or both
- Many required metadata fields will cause users to migrate to shadow IT solutions (like DropBox)
Using Azure Functions and Cognitive Services Text API to enrich a Flow that fills Metadata for new items in a Modern SharePoint Team Site. PERFECT!
- Using Modern Team Sites in SharePoint for document storage enables collaboration
- SharePoint Search is decent, but thorough metadata tagging makes it a lot more useful
- Using Azure Cognitive Services we can fill the metadata fields automatically – without any user interaction at all!
- Content will be found and users will be happy :)
What it does
Automates metadata tagging for SharePoint Online.
- A Flow attached to Document Library will call the Azure Function that’ll do the heavy lifting
- An Azure Function will run, extract text, analyze it using Azure Cognitive Services, and then write the info back to SharePoint Online
- Finally, notifies admin of the execution and the creator of the file.
How I built it
Using Visual Studio 2017, but you could do everything in VSCode, too.
Challenges I ran into
Extracting text from a PDF sometimes returned a lot of headers / metadata and was scrambled - had to work around that.
Accomplishments that I'm proud of
Sanitizing the text from a PDF is not straightforward, but I think it works decently. Also, this was based on a topics extraction tool I made for a PoC a while back, and Microsoft has completely changed the API since then, so learning the new API was rewarding.
Also, it's fun to see how well the Azure Function + Flow + SPO -integration works!
What I learned
- Integrating a Flow to a Modern SharePoint Site
- New version of Cognitive Services
What's next for Resolving Managed Metadata Madness in SharePoint
Refactoring it to be either easily deployed to whatever environment using PowerShell, or making it a multi-tenant solution.