IControl

Inspiration

Recently, I was on a very long flight for my Winter Break vacation. The plane ride was 14 hours long straight. During it, I didn't have enough room on my tiny airplane table to use my mouse. And then I though, "What if I just take the mouse out of the question"? My trackpad was broken too at the time, so all I could do was dream.

A few days ago, I was cleaning up my room. And in it I found a book I read a couple years ago, "Insignificant Events in the Life of a Cactus". The book is about a girl with no arms, and the struggles she had to go through in life. One of the things she did was write a blog, using her feet to type on the keyboard. And along with my idea on the plane of "What if I just take the mouse out of the question?", the idea of IControl was born.

What it does

IControl accesses your computer camera to scan your face, and moves your mouse according to how you move your head. Most importantly, it scans your eyes. Which are then used to track if you blink which would make you click.

How I built it

IControl uses the OpenCV and DLib libraries, which give access to image recognition. The image recognition calculates your face and finds the center of both of your eyes, which it uses to figure out the difference in the way your head moves. Then, after a little bit of the magic of calibration and a little bit more geometrical math, IControl moves your mouse using the PyAutoGUI library. The way that OpenCV detects your face is through the use of landmarks, which are important points on your face. There are 6 of these points around your eye. Apart from using these points to detect where the center of your eyes are, the distance between the top and bottom of your eyes are also calculated. If they are close enough together, it will count as a blink and you will click. A timer for this blink clicking is in place too, so you won't accidentally click through your natural blinking

Numpy is used for all of the math. Without Numpy, none of this would be possible

Challenges I ran into

Downloading libraries is a pain. Specifically with DLib, I had to install CMake, the C++ compiler, onto my laptop which took an embarrassingly long time when compared to the other libraries.

The math used in this was very difficult to figure out as well. Originally, I attempted to use the landmarks to create a 3d model of your face and the screen you were looking at, but without a second camera to perceive depth, that was not possible. After that though, specifically the formula needed to calibrate the mouse to the size of the screen and the math needed to find the distance between the top and bottom of your eye gave me trouble.

Image recognition is not a very stable process at all. As you will probably see with the demo, the mouse is jittery. But that can not get any better without sacrificing efficiency and response time. I feel like in the current state it is good enough, though.

Python is not that quick of a language either. Compared to other languages I am used to using, python is considerably slow. And with the added image recognition on top the frames per second dropped down to maybe 5 fps sometimes. But I am proud to say I tripled that number.

Accomplishments that I'm proud of

This is an application that once perfected outside of the 10 hours that I had today, could go on and do great things. I took a huge stride in the work that I did today compared to other hackathons I have done. Especially being solo, the fact that this is my most complex hackathon submission by far, even compared to the teams I've been on, makes me very proud of the work I did.

What I learned

Before this, I have really never worked with image recognition or anything similar ever. The learning curve of how to use OpenCV and DLib is steep, but once you get the hang of it you can fly.

I've also gone deeper into the backend than I ever had before, with this probably being the deepest into the backend I've ever gone in a competition.

I feel like I also doubled my knowledge of how to use math by the means of programming today. I've never used Numpy before today, and it feels like I have been missing out.

What's next for IControl

Training an AI model specifically for the detection of face (and hopefully pupils next time) is the biggest thing on my bucket list to make the movement smoother. With pupil detection too, the need to move your head at all would be eliminated.

If functionality for 2 cameras are implemented, the usefulness of IControl would skyrocket. With depth perception, multiple screens could be used. That would be insanely useful in many situations and open up IControl's already incredible accessibility into a whole new field