Inspiration

The way we work has changed dramatically. The location we use as a workstation is no longer static, but rather "hybrid". New demands and challenges arise with this new model of workforce movement. The technology to support this work in hybrid format is also evolving rapidly to keep everyone connected. Knowing this, we are thinking not only about connections, but about the quality of these connections.

This new work model is generating new frontiers, one of them the "physical office", and challenges for C-Levels. Just as we had to adapt to this new scenario, imposed by the pandemic caused by COVID-19, network infrastructure and physical spaces had to do the same. Thanks to Cisco's solutions, combined with customizations made through APIs, this adaptability was achieved, enabling a balance between security, collaboration, sustainability and productivity. This space should also create a significant impact on the experience of its users.

To create this impact on the user experience we implemented the concept "Zero Friction". This concept aims to create a fluid experience for users as they cross the boundary of a physical office having access to various space features with minimal interaction, clicks on a screen, possible. At the same time that information is captured, user identification and generated reports and Dashboards that facilitate decision-making of C-Levels.

What it does

The first border of access to a physical office is the receptionist. Our work focused on creating a virtual receptionist who connects and uses various Cisco solutions to provide quick and easy access to a number of office features to users. In the current version for this Hackathon we have implemented the following features:

Users Identification Through Meraki's Smart cameras we capture the image of the user who lit the office. Using computer vision algorithms we extract through facial biometrics the identification of the user and categorize him as a known user (through the catalog of faces registered for the solution) or unknown. As soon as the user is identified, an API notification is made and the identification information for consumption of the voice assistance system is made available.

Voice Assistant For voice interaction, we use Alexa. When receiving a call via API stating that a person has been identified the wizard initiates an interaction as the user welcoming: "Hello, welcome to NTT! Ahh you again . You look good today. How can I help you?"

The following features have been implemented for this Hackathon: Release of physical access - The user may request via Voice for the opening of the company's doors. For this functionality we integrate the Cisco DUO dual factor authentication solution. Using the APIs we send a confirmation request to the user that when confirmed, the company ports open.

Meeting Assistant - If the user informs them that they have a meeting, the assistant will ask who they want to meet. By identifying by name a user through a search in the company's AD, we integrate Cisco Webex to send a notification via chat to the employee responsible for the meeting that your guest is waiting for at the reception.

Wireless Visitor Access – Using the company's network infrastructure (Switches and Wireless Catalyst), we have created an integration through APIs with Cisco ISE in order to create the credentials of a guest user for a guest and make available to them through email and QR-Code. Thus, the visiting user upon arrival at the company and is identified by the system, can request their access to the network in a totally "no touch" way.

Help If the user has difficulty interacting with the system, he can request the help of an operator. In this scenario we created an API integration with the Cisco telephony system using an MX-200 as a video calling device. When asked for help the system automatically closes a video call with an operator who can assist and welcome the visitor to the NTT facilities.

How we built it

The user recognition system and identification system was developed using python programming language. We create a routine for capturing the images through the RTSP stream of the camera. Using the features of the OpenCV library we perform a face search and face clipping for processing. The face is encoded and performed a search on a database of encoded faces. If found the user is given as known. If not, the user is given as unknown. At the time a user is recognized by the system, two API calls are sent – One to the Alexa system that notifies that a user has been identified. And another for code created on AWS using API Gateway, Lambda, and DynamoDB. This second call stores the identified user data that will be requested in sequence by the Alexa system for the beginning of voice interaction.

To initiate voice interaction** we developed code under the Bot Framework library, which through an API call (in the identified AWS user storage system) seeks to identify whether it is someone known or not. A connector with a custom skill was also created at Alexa Skills to be able to provide bot communication (published on Azure) and with our end customer.

Challenges we ran into

To identify users the main challenge was to have to deal with the large amount of images generated there is a rate of 20 fps (frames per second) and perform all analysis, facial detections and database search. Have to deal with and understand the angles of capture, distance from the camera to the user, framing and focus to get a "processable" image. And perform all the necessary functions in a "real-time" time interval. Deliver the data through APIs to the beginning of voice interaction in a natural and imperceptible way to the user. Integrating and defining how to make this data available was also a challenge.

Accomplishments that we're proud of

We were able to build a system that sees, speaks and listens to the user, understands basic requests for useful functionality for the day to day and executes them for the end user. We were extremely proud and thrilled to experience for the first time the full interaction of the programmed flow. The consistency of user identification and voice request intuitively hit our main goal from the beginning of the project - "To create a Touchless system – without friction for the user."

What we learned

We learned how to use APIs, create APIs, process incoming data, send data, share data between different services and programming platforms. We know different libraries and code packages available in the market for identification and dealing of images. We learned how to optimize the treatment of images for real-time analysis and identification.

What's next for Jarvis2

The first main goal as next steps will be to create an operational, stable and functional version to go into production as a demonstration in an NTT physical office. We'll also create dashboards with detailed information on the interactions, IDs, and usage reports of a physical office to be made available to C-Levels. Using APIs available on Cisco video conferencing devices available in the company's meeting rooms, we want to generate usage and count data within these rooms. Aggregate more features such as: Print a label with the credências of wireless visitor access. Monitor the flow within the physical spaces of the offices and generate KPI of how they are being used. Detect agglomeration and overload points of people in spaces.

The next steps should transform Jarvis2 into a complete assistant and repository of information for the use of physical environments, as the main goal of always providing a unique, easy, friendly, healthy and sustainable experience for users, helping them to generate a great impact in the world around them. We also want to generate and make relevant data and information available to C-Levels to support decision-making for the future of physical offices.

Built With

Share this project:

Updates