Inspiration
Literature is one of the media that allows us to qualify, contact with, and experience the true potential of human ingenuity. There are countless quantities of books, and selecting the one that best fits an individual is highly challenging. Despite there being writings that plot out the overall context of the book, it lacks the "fun" behind the action of selecting a best-fit book. This notion inspired the creation of "12 Greatest Books," which utilizes Computer Vision and Augmented Reality technologies to provide an immersive and interactive experience for people, especially students, in their journey of literature reading.
What it does
A Snapcode is necessary for this utility to be fully active. When the user scans the Snapcode attached to the book's front cover and zooms out to provide the overall image of the book, the Lens classifies the book and displays the relevant details and plot through a Snapcode-tracked AR visual.
How we built it
The development process is divided into data acquisition, fundamental development, and model training. As there isn't a single book cover design for a book, a broad spectrum of designs is necessary for the training data. Thus, images of books were gathered from eBay, developing a custom web-scraper that searches and downloads images of book covers. For the fundamental development part, labeling takes the most considerable portion. It's not only the book's title, and author be provided in the AR experience, but also the summary of its plot should be given. This part was generated using GPT-3, a powerful natural-language processing model developed and released by OpenAI. The Computer Vision model training was done using the SqueezeNet model structure, which is known to take minimum storage capacity. After augmenting the acquired image data of book covers through applying position shifts, rotations blur, etc., it was fed through the model structured using the Tensorflow and Keras libraries. The trained model was imported into Lens Studio to create the Lens by integrating the SnapML framework.
Challenges we ran into
Data acquisition and Model training were the most challenging processes. As the image data were acquired through web-scraping, irrelevant images were often included; different books shared similar designs, which may be challenging for the model in terms of classification. Thus, after the initial acquisition process through running the web-scraper, manual editing of the dataset was necessary to maintain each category unique and balanced. The initial model structure used for training was not SqueezeNet, but instead, the Resnet models with different layer numbers were used. Moreover, the number of book classes to include was initially planned to be 199, but due to the limits in Lens Studio's resource storage capacity, pivoting the model structure and the number of books to classify was necessary.
Accomplishments that we're proud of
Despite most available resources and time being spent on training a model that cannot be used due to an exceeding resource demand, and despite the pivoting was decided only a few days before the project deadline, it was doable and able to be accomplished. Additionally, even though the initial approach and the initial model failed, this was the foundation lesson necessary for accomplishing what's needed to get the project done. From getting to know different Computer Vision model structures to use a Cloud GPU, numerous accomplishments were made in learning new concepts.
What we learned
Most of the challenges were met during the model training process, and most of the available time was spent training the model, which resulted in an exceeding resource capacity. Although the pivot was urgently made, it could be accomplished with fewer failures and mistakes, thanks to the experience gained through previous attempts. Moreover, this was a prominent opportunity to learn diverse model structures for image classification in Computer Vision.
What's next for 12 Greatest Books
More from classifying the 12 Greatest Books, we expect to enlarge the scope of approach through diversifying the category of books to feature in the upcoming lenses. Various model structures for Computer Vision can also be explored, which can be used for future ML applications. More from books, the identical approach can be made to movies, games, etc. The potential area of application is limitless.

Log in or sign up for Devpost to join the conversation.