Inspiration

The inspiration for the Pictionary Plunge came directly from Google's Quick! Draw challenge that was released in 2016. It was a unique representation of how machine learning can decipher human creativity in real-time. We were inspired to take this idea and push it further, aiming to improve accuracy and expand the application to diverse datasets.

What it does

Pictionary Plunge is a machine learning-based challenge where users draw doodles, and in real-time, our model classifies and predicts what the drawing represents. As the user sketches, the system captures each stroke and processes the data, offering instantaneous feedback to the user about the guessed object or concept. It's an interactive way for users to challenge the system's recognition capabilities and, at the same time, understand the intricate workings of machine learning models in the domain of image recognition.

How we built it

Our approach was methodical and divided into well-structured steps:

  1. Data Processing: We began by obtaining the dataset from the Google Cloud platform and processed the raw stroke maps into binary image maps.
  2. Model Building: A convolutional neural network (CNN) was employed to handle the image data. We interspersed convolutional layers with Max/Mean Pooling 2D. After this, we flattened the data to feed it into LSTM and hidden layers. These layers utilized batch normalization, activation functions, and culminated in a softmax categorization layer.
  3. Model Evaluation: We pitted our trained model against industry standards like MobileNet, ResNet, and VGG.
  4. Optimization: Hyperparameters were tweaked, dropout was added, and regularization techniques were employed to optimize the model's performance.
  5. Transfer Learning: We saved the weights of our trained model and adjusted the learning rate for future applications. This set the stage for the model to learn from new datasets while retaining its foundational knowledge.

Challenges we ran into

Like all ambitious projects, we encountered our fair share of challenges. The foremost was converting multidimensional stroke maps into image maps without losing essential information. Ensuring that the model didn't overfit, given the complexities of doodles, was another hurdle. Comparing our model's performance with established architectures like MobileNet, ResNet, and VGG was daunting but enlightening.

What we learned

This journey gave us a hands-on understanding of convolutional neural networks, the intricacies of image data, and the nuances of hyperparameter tuning. More than the technical insights, we learned about the value of persistence, iterative development, and the importance of a well-structured approach in machine learning projects.

Share this project:

Updates