Wouldn't it be fascinating to to create digital paintings by waving one's hand in the air? We've all seen sci-fi movies where someone like Tony Stark flicks their hand and their images come to life. This weekend, I approached this problem, moving around a magic object in real-time in an attempt to create digital art.

What it does

MagicPalette tracks the movement of your magic object to create a digital version of your art piece. As you move your object, you can see the resulting painting on a white canvas. MagicPalette also allows the user to interact with the app for some useful functionality. The top left corner of your video stream serves as a color chooser - choose between colors by holding up 1, 2, 3, 4, or 5 fingers! If you're happy with your painting, you can just smile and the app saves your art work. If you're unhappy and make a sad face, then the canvas is erased and you can start afresh!

How I built it

There were 3 major updates to the project.

The first was to hook up the laptop webcam, get the real-time video stream, run object detection on a chosen object, and map out the movements on the canvas (a blank image). This section was extremely tricky for multiple reasons - state of the art object detectors do require some processing time. Running the whole project on the CPU means that I'd have to face some serious lag I were to process every frame of my > 30fps camera. To avoid processing every single frame, I threaded the camera class so the buffer remains empty and I only process the latest available frame. For the object detection model, I first used YOLO pre-trained on the COCO dataset, which is supposed to give state of the art results for real-time video processing. This, while accurate, had a lag which made the entire process seem jittery. However, since it was trained on COCO, it did detect objects that could prove to be reasonable magic objects for drawing - such as a toothbrush. To obtain better real-time results, I then used a pre-trained SSD - Mobile Net, a lighter network that is designed to run with mobile devices. Thus, it provides better speed, but at the cost of accuracy. Another issue was that the only pre-trained MobileNet SSD I could find was only trained on ~20 classes, out of which only 2 items could be used to draw - a bottle or a potted plant.

Second, I added some face detection followed by expression detection to make interaction more fun. Using OpenCV's HaarCascade frontal face detector, followed by a pre-trained expression detector, we get the probabilities of every expression on the detected face. I take the max from a rolling window to understand the current expression of the user. If the user smiles and it happy with the painting, the canvas is saved as a JPG file. If a sad or angry expression is detected, the canvas is reinitialized to a blank image.

Finally, I wanted the user to be able to choose colors. I tweaked a gesture recognition module I found on the internet such that when the user holds up finger(s) in the top left region, the corresponding color is chosen. The hand image is thresholded, and a convex hull is obtained. The user can now hold up a finger to choose a color!

Challenges I ran into

Real-time processing too slow for YOLO, dearth of suitable classes in MobileNetSSD, Expression Detection is hard to improve, gestureRecognition is noisy, limit on speech processing because of real-time constraint

Accomplishments that I'm proud of

Really happy with what the results were for a relatively short hacking period. Although I spent a lot of time trying to get the object detectors working for real time videos, it was really fun learning about new models and incorporating new features into the product. It's been a good weekend!

What I learned


What's next for MagicPalette

We could look into fine-tuning pre-existing object detection models for very specific objects such as a wand or a paintbrush. This would hopefully make the detections more accurate. Although this is just my first attempt at it, a refined product could have multiple useful applications in sculpting, design, CADding, film, and television. It would democratize art, and alter the way humans interact with the digital world.

Built With

  • expression-detection
  • keras
  • object-detection
  • opencv
  • python
  • sklearn
Share this project: