Currently, augmented and virtual reality systems require handheld controllers or room invasive sensors for navigating virtual environments. Unfortunately, the current controllers are inaccessible for those with disabilities. To address this problem, we developed a system that integrates facial and gesture-driven navigation to make games and programs more accessible.
What it does
Using the user's webcam, we identify features such as their face, eyes, and hand. The user can move around virtual environments by moving their head. Tilting the head to the left pans the character left, the opposite when tilting right. Moving closer the camera moves the character forward while moving away from the camera moves the character backward. The user can use an open hand to hover over menu items and select them.
How we built it
We built our system using the 2019.4 version of the Unity Game Engine and Microsoft’s Mixed Reality Toolkit. Our system utilizes the Universal Windows Platform and can deploy natively to the Microsoft HoloLens 2. We also utilized assets such as OpenCV plus Unity and trained HAAR cascades for computer vision. This was done through C#. We used ProBuilder and CScape to create our scenes and virtual environments. The Face Detector class takes the webcam input and converts it to fit our specific needs. We then import trained HAAR cascades for later use on image recognition. On each frame update, we update our processing frame with the webcam texture and find the face, eyes, and hands in the image. We do this via three methods. These three methods run our frame through different cascades and return an array with detected objects. We then separate these objects and get their location on the webcam. Finally, we take all of the location data and check if our detection algorithm is false reporting. If it isn’t, we calculate the height differences between the 2 eyes and rotate our player view with that value. We also draw rectangles around detected objects for better visualization. We take this render and set it as the texture of a canvas in our scene. We also calculate the area of the user's head relative to their webcam’s field of view and use that to determine whether they intend to be moving forward or backward.
Challenges we ran into
Due to working with both facial and gesture recognition, the trained haar cascade often mistook the user's head for their hand. We addressed this issue by adding a dead zone from the center of the face. Another complication we ran into was the large dead zones in the eyes that caused low-resolution tracking, however, we noticed this was only persistent on wide field of view webcams with no noticeable problems on standard webcams.
Why we're unique
GDARN provides explores the feasibility and accessibility of facial navigation in virtual and augmented reality. This technology will provide accessibility for many people because it is easy to understand and commercialize. Also, GDARN is a novel technology as research in this area has only recently begun. We are able to combine face landmarking and VR/AR Navigation.
What's next for (GDARN) Gesture Driven Augmented Reality Navigation
Next, we plan to improve the resolution of face tracking, specifically improving eye detection and tracking. We also plan to optimize the program, lightening the processor load to allow for better compatibility with a wider range of hardware. In addition to optimization, we want to add support for more gestures such as opening the hand to stop, closing the fist to select in-game objects, and detection for fingers. And we would like to have this technology portable to various games in the future to bring accessibility to many more virtual and augmented reality programs.
Alex Wang, Kamran Hussain Team: 418 I'm a teapot, #20