AO - TechCognize

Trial 1
GUI
Trial 2
Trial 3

Inspiration

The inspiration behind this hack was the fact that many physically challenged people were still not able to enjoy access to the internet even though technology has enabled us to go far ahead of what people might have imagined a few years ago. This application helps people, who need special accessibility features to use these services, which are around 0.15% of the human population, which roughly comes out to 10.5 million people, a sizeable portion of the human society.

What it does

The "TechCognize" enables one to use facial movements and spoken words to perform the fundamental functions of a mouse, including left-clicking, right-clicking, scrolling and movement of the mouse pointer. The Graphical User Interface is a plain and simple introductory screen which gives the user the option to go through the controls, launch app or exit it. The app is turned on by opening one's mouth wide open when they are at the correct location in terms of the calibration. The app itself uses facial, or more specifically, nose movements either to scroll or to move the mouse pointer, which can be switched by closing your eyes for longer than the threshold blink time. You can left-click by either winking your left eye or saying "left" an similarly, right-click by winking your right eye or saying "right".

How we built it

We used Python to build the app. It uses hardware integrated machine learning in order to find out the locations of various points called nodes or pointers which enable us to integrate mathematics in order to figure out the relative distances and directions, which are then used in the process of actual movements of the mouse pointer and scrolling. These nodes and pointers are also crucial to check the actions of winking, closing eyes and opening the mouth, which is a fundamental part of its functionality. Frame-by-frame image recognition was an important part of the process too. In order to allow for easier accessibility option, we used _ Google Speech-to-Text API _ in order to enable voice commands for left and right clicks.

Challenges we ran into

Facing issues because of CMake in the installation of dlib libraries in Windows laptop. Worked with only one laptop for most of the duration. Tkinter library was not working in Mac OS due to a new bug, because of which we had to rely entirely on Linux for the rest of the duration; as it was the only platform that overcame these two crucial hurdles. We had to spend a lot of time trying to determine the position of the nose pointer with respect to the calibrated centre and to optimise Google Speech API and mouse drag, both of which gave a lot of errors initially. Setting up the GUI using tkinter was also a major hurdle as mentioned above.

Accomplishments that we're proud of

We are proud that despite all the hurdles, errors in code, hours of mind-bending problems, we were finally able to implement not just all of the features we thought of initially, but even more. Adding scrolling, speech recognition and the GUI were decisions that we took along the process of coding the hack. The mouse pointer moves in all directions with both pitch and yaw control at the same time, enabling it to move 360 degrees. Also, it gives the option of control through background speech recogniser which is always running; which gives extremely quick and accurate results relative to other such applications. It gives audio acknowledgements, which may be beneficial to some. It is able to recalibrate when the face disappears from the frame, allowing users easy access and control.

What we learnt

We learned that group assignments can also be extremely productive, an uncommon notion in modern society. We also learnt that being in a group helps in solving many problems that one might encounter while working alone as the different sets of strengths multiply to give a much better end-result. Apart from these moral learnings, we also learned a lot about python, its libraries, its functionalities, new concepts such as Tkinter, OpenCV, bits of machine learning, coding real-world projects etc, some of which, we didn't have the vaguest idea prior to making this hack. Best of all, we now have better self-realisation, as we now know better than earlier, where are strengths lie and how we should be collaborating with others, another part of our skill-set that we are better at now.

What's next for AO - TechCognize

As mentioned earlier, this hackathon has been a very important part of our lives, as it has taught us various hard and soft skills useful for us in the long run. In the short-run though, there are still some problems or bugs like fixing the info tab which does not have a 100% chance of working due to platform issues and lack of cross-platform compatibility, that haunt the massive success that TechCognize as an accessibility app would be and the advantage to the society because of it. The first step for us to get this out of a developers' laptop into the gadgets of the masses, would be to solve these issues and get it to a proper working condition when it could become a boon to the society if we are lucky enough. The least we can guarantee though is the betterment in the working conditions of the application with the days to come.