As an IT Service Desk or IT Admin how many times a day do you receive queries about users’ status, entitlements, applications and other simple queries? These queries can come from the user but are also often from a user’s Manager or peer. Depending on the context of the query any number of different tools/consoles are then required to find the answer, or even escalation of the query to the owner of a specific application/service/system. What if you could use a Voice Assistant to query a repository that held a superset of information about a user including their status, entitlements and access? A repository such as Microsoft Identity Manager that through its connectors has knowledge of a user in disparate heterogenous environments. All without having to train IT Support staff on yet another console/service and even more important without exposing unnecessary or restricted information.
- Does Dwayne Johnson have an Office365 mailbox and if so what their email address is?
- Does Jerry Seinfeld have an Active Directory account and if so what is their LoginID?
- Does the employee Tiger Woods have an expiry date and if so what is their End Date?
- Is the Active Directory account for Tom Cruise enabled?
My Voice Assistant for Microsoft Identity Manager is an Azure IoT Business Solution that I’ve designed and built using Azure IoT, Azure Cognitive Services, Azure Serverless Services and other Azure Services along with Microsoft Identity Manager that empowers IT Support staff to query Microsoft Identity Manager for common IT Service Desk, IT Admin requests.
- The IT Support Staff member speaks to the Voice Assistant using a wake word and gets a "Yes" response from the Voice Assistant. They then speak their request (e.g. Does the user Dwayne Johnson have a mailbox)
- The Voice Assistant takes the spoken request and submits it to Azure Cognitive Services Speech to Text to convert the request to text.
- The Voice Assistant then submits the request to an Azure Function that takes the request and; (3a) sends the request to Cognitive Services Language Understanding Intelligent Service which identifies the Entity (User) and Entitlement (Mailbox) and returns it to the Function; (3b) which then queries Microsoft Identity Manager (via Azure API Management and the Lithnet Rest API for MIM) for the User and returns the user record to the Function; which then identifies the value for the entitlement being queried (mailbox) and generates the response text which is returned to the Voice Assistant
- The Voice Assistant takes the response text and submits it to Azure Cognitive Services Text to Speech to turn the response into audio
- The Voice Assistant speaks the response to the IT Support Staff Member
- The Voice Assistant sends a summary of the interaction to IoT Hub which sends it to Stream Analytics and logs it to Azure Table Storage as well as sending it to Power BI which displays Analytics of the use of the Voice Assistant.
- OpenWRT on Seeed Studio Respeaker Core 1.0
- Python 2.7
- IoT Device & IoT Hub
- IoT Event Hub
- Azure Functions
- Managed Service Identity
- Azure Key Vault
- Azure Cognitive Services - Speech to Text
- Azure Cognitive Services - Language Understanding Intelligent Service
- Azure Cognitive Services - Text to Speech
- Azure API Management
- Azure Table Storage
- Power BI
Microsoft Identity Manager
- Microsoft Identity Manager 2016 SP1
- Lithnet MIM Service REST API
The Hackathon to use Azure IoT and Serverless functionality was the catalyst for me to build this project. Last year I built a WebApp version that allowed a Service Desk or IT Admin to search for a person and get a report on that uses metadata from Microsoft Identity Manager. The natural extension is therefore to use spoken word to query and have the result spoken back. It also provided me with the impetus to learn how to use and integrate a number of new Azure Services I previously hadn’t used.
What it does
The IoT Device listens for the Wake Word (Lithnet) and takes a spoken query and converts it to text using Azure Cognitive Services, then uses Azure Language Understanding Intelligent Service (LUIS) to identify an Entity (Person) and an Entitlement (e.g. Mailbox, Active Directory). It then queries Microsoft Identity Manager (via Azure API Management and the Lithnet Microsoft Identity Manager REST API Service) to find the queried Entity before finally providing an audio response for the queried entitlement and logging the event into Table Storage and Power BI via Azure IoT Hub.
How I built it
After architecting a high level conceptual solution, I started investigating the elements of the solution I didn’t have previous exposure too. I started with Cognitive Services and used PowerShell and Invoke-RestMethod to formulate my web requests to integrate with the services and perform the necessary functions. I then designed my custom LUIS Model to take phrases and identify the entity and the entitlement. When I had working solutions for Speech to Text, Text to Speech, and LUIS I translated the necessary initiation piece to Python (Speech to Text) and got it working from my IoT Device. I then configured Azure API Management to front end the Lithnet Microsoft Identity Manager REST API Service. I translated the LUIS and Microsoft Identity Manager (via API Mgmt) queries into a Webhook Azure PowerShell Function. The Azure Function utilises Managed Service Identity and Azure Key Vault for the necessary credentials. Next I converted the Text to Speech PowerShell script to Python to return an audio stream of the query response that is then spoken back to the requester. The final piece was to log each request and response to Azure Table Storage and Power BI via IoT Hub for Auditing and Reporting functions.
Challenges I ran into
There were many challenges in this project. Some of the more challenging ones that took way longer than I anticipated were;
- Developing a LUIS Model for the types of queries and intents that I wanted to be recognised
- Getting Device to Cloud IoT Messages from the IoT Hub through to Table Storage and Power BI. Lots of reading and lots of failures until a configuration on the Event Hub saw everything fire up.
- Working with the available memory on the Respeaker Core embedded Linux device and the version of Python offered meant that I had many failed attempts with different Python Libs to be able to send Web Requests to Azure with the differing payloads and authentication methods
Accomplishments I’m proud of
I’m very proud of the entire end-to-end solution. As an experienced Identity Architect I’ve built many Identity Management solutions for customers over the last 20 years. The concept of being able to talk to an Identity Management backend and have it talk back to me still amazes me even though I’ve been working on this for a couple of months. Approaching the crazy concept initially I thought my ability to actually build it was low. After working through a high-level solution and breaking it down into functional components and then working through them individually my confidence grew and I built a working solution.
What I learned
Building this project exposed me to numerous Azure Services that I hadn’t previously used as well as using some services in different ways. New services I leveraged and got my first exposure to are;
- Azure Cognitive Services Speech to Text - Used to convert the voice query to text before being submitted to LUIS
- Azure Cognitive Services Text to Speech - Used to take the search query result and convert it to Speech to be spoken back to the requestor
- Azure Cognitive Services LUIS – (Language Understanding Intelligent Service) used to identify the entity to query and the entitlement to evaluate
- Azure Event Hub - Sending IoT Device to Cloud messages through he IoT Hub to Event Hub to Stream Analytics.
- Azure Stream Analytics - Taking the Device to Cloud Messages and outputting to Table Storage and Power BI
- Azure Table Storage - Storing events for auditing/reporting
- Azure API Management - Integration of Lithnet MIM Service REST API
- Python - This was my first ever use of Python. The Python script on the IoT Device listens for the Wake-Up Word and interfaces with Azure Cognitive Services for Speech to Text and Text to Speech along with the Azure Function
What’s next for the project
The project is currently essentially read-only. The information that can be retrieved and the pre-requisites to make it available mean that it doesn’t consider context or authorisation. I’d love to implement voice-based authentication into the solution so that the speaker/requestor is identified by their voice pattern (using Azure Cognitive Services Speaker Identification API) and authenticated to Azure Active Directory. Based on the authorisation level of the user higher privileged requests can be initiated. (e.g. Disable User David Bowie, or Create a new user account for Colin Furze). I’d also like to enable the solution using Azure Bot Services. The way the solution has been designed allows for this. This would replace the need for a device to speak too, and provide an easier implementation path for the Authentication and Authorisation functions. The Bot could also be enabled via Skype for Business/Microsoft Teams.