I have a desire to make technology that facilitates creative expression for those who may not have the use of their hands. There's lots of applications for painting and drawing that require mouse, keyboard, or touch. I wanted to create tools for creativity and art that could be more accessible.
On Friday, I remembered that last year Ed Slattery had mentioned it would be great if there was a hands-free painting/drawing application. I was then inspired to combine my interests in creating and hacking user expression in a creative way, and decided I wanted to prototype this application.
What it does
Facepaint uses a web interface that allows the user to draw and paint on an HTML canvas. After the initial camera and microphone are granted permission, the user is able to control the application with speech and head motion. It's been tested using Chrome.
The speech recognition processes start running once the page is loaded. After saying "start," the application powers up and starts detecting the user's head motion via the web camera.
Once the user's head motion is detected, they can draw on the canvas by moving their head left/right and up/down. The 'brush color' can be changed with voice. For example, once the app starts, the user could say green and the brush trail would be green. After that, the user could say blue to switch the color to blue.
In addition to the color-changing, there are other features that are voice controlled, such as being able to say clear to reset the drawing canvas. There is also a save functionality that will download the current canvas.
Users will be able to create and save unique drawings, as each new save generates a unique filename that automatically downloads.
How I built it
I used this as the main structure for the application, and then the additional functionality is layered onto this.
The headtracking is done via the headtrackr.js library by Audun Mathias Øygard. Once the user's head is detected via the webcam, I made a function to capture the incoming event data from the camera. The constant feed of x, y, and z coordinates are stored and mapped in a function called trackingHead. This function runs at the same rate as draw, so there is a continual stream of event data from the user's head movement that is constantly being mapped to the canvas at the same rate as the canvas 'refreshes.'
The headtracker can be started by speaking a start command. This is done so that other page element's can load before the headtracking data starts to stream in.
This event data is then transformed into position data. A sprite is drawn (handily with an imported function from the p5play library by Paolo Pedercini), and this sprite is then repeatedly redrawn on the screen as the user moves their head. There was a significant amount of mapping involved to tweak the sensitivity of the head motion data to the actual cursor movement on screen.
The last aspect was the speech recognition. The core speech recognition functionality is provided by the p5speech library by R. Luke DuBois. This enables continuous recognition from the user's microphone. This input comes through as a long string, which is then parsed to only evaluate the last word. To eliminate fluctuations, I converted all the data to lowercase.
There is another function, parseResult, where I included all the voice commands as a series of if-then statements. If the user's most recent word is a supported color value, the sprite's color is set to the RGB value. For example, if the user says green, the color is set to (0,255,0), and the 'brush cursor' updates immediately.
I then added in additional commands such as the ability to reset and save the canvas. The saving feature captures the current HTML5 canvas and saves (and downloads) it as an image file. I created a short algorithm to automatically generate unique filenames so that the user can save a series of unique canvas captures.
Challenges I ran into
During the development, I ran into some issues. The main one is reducing the jitter and noise from the headtracker. This challenge was partly addressed by the movement mapping. I mapped the estimated average extremes of the left/right and up/down head motion, and then mapped these values to the dimensions of the canvas. After this basic mapping, I wrote a short formula to tweak the velocity of the cursor.
Since the cursor is generated as a sprite thanks to the p5play library, I was able to tweak the velocity vectors for the x and y position.
Another challenge is that the voice recognition occasionally picks up lots of noise, and this can result in it becoming stuck. One solution for this was to convert all the incoming strings to lowercase, as the library would occasionally return variations in inflection as capital letters.
The other challenge is that there is a frame rate drop over time due to the strain put on Chrome. As p5 refreshes at 60fps, and the user's webcam is constantly streaming data, there is a high demand on the browser. I believe that this could be addressed by re-writing it as a mobile application.
Accomplishments that I'm proud of
I'm proud of bringing this to life! My goal was to create a prototype with the basic functionality and proof of concept. I now know that this type of application is possible, and with some tweaks I think that I could address the ongoing issues.
What I learned
I learned a lot about breaking down a larger project vision into several required components. I mapped this out as the larger project, and then split it into several key pieces. After I knew each one worked, I integrated each together. This was a huge step for me in terms of project management and development!
What's next for Facepaint v1.0
I want to continue to tweak the settings to further refine it. One other thing I want to do is to experiment with other frameworks, since the constant refreshing of p5 may ultimately be overkill.
On and Off functionality for brush up and down!
One immediate functionality that I want to implement is the ability to calibrate the sensitivity based on the user's motion. Before loading the app, the user would move their head to the left, right, up, and down and then the mapping values would be based on their range of motion instead of my estimated averages. Can I update?