Inspiration
Born and raised in a farming community, I've been blessed with an intimate understanding of the vital role food plays in our lives - not just as a source of sustenance, but as the foundation of our health, well-being, and cultural heritage. Having lived the first 18 years of my life close to the earth, consuming fresh, farm-grown produce, I've been acutely aware of the profound nutritional bounty that nature provides us.
However, moving to urban areas, it was distressing to witness the disconnection that many people have with their food. This disconnect, coupled with the widespread issues of malnutrition and food insecurity, especially in third-world countries, fueled the inspiration for Image2Nutrients. In honor of my agricultural roots and in response to the global nutrition crisis, we sought to create a tool that would empower individuals to better understand the food they consume, thereby promoting a healthier, more informed relationship with nutrition.
What it does
Image2Nutrients is a pioneering tool that aims to transform our understanding of nutrition. Using the power of artificial intelligence, it scrutinizes images of meals, identifies the ingredients, and provides a comprehensive nutritional breakdown. Acting as a virtual nutritionist, it delivers insights and suggestions for nutritional enhancement, enabling users to make healthier, well-informed dietary choices.
The process is simple. Users upload an image of their food and Image2Nutrients identifies the ingredients. It then removes irrelevant words and duplicates, and utilizes GPT-4, a powerful language model, to provide a comprehensive nutritional analysis and suggestions for improvement. The result is a list of recognized ingredients and an insightful nutritional analysis that users can leverage for healthier dietary choices.
How we built it
Building Image2Nutrients was a multifaceted process that entailed the use of cutting-edge technologies and a carefully curated dataset.
To start, we employed a pre-trained VisionEncoderDecoderModel to interpret the food images. This model, which had been fine-tuned on a specific food dataset, was capable of identifying key ingredients from the images. The feature extraction and tokenization tasks were handled using the Vision Transformer (ViT) and an AutoTokenizer, respectively, both from the same pre-trained model ("nlpconnect/vit-gpt2-image-captioning").
Training the model was a complex process. We utilized Google Colab Pro for its computational capacity. However, the critical success factor was the careful preparation of our dataset. We meticulously collected, cleaned, and preprocessed a dataset of food images and their corresponding ingredients, stored in a CSV file. We ensured the images were in the correct format and discarded any irrelevant entries.
The training process involved running batches of images and their corresponding captions (ingredient lists) from our dataset through the feature extractor and tokenizer. This transformed the images and text into a format that the model could interpret. The inputs were then fed into the model, and the loss between the model's predictions and the actual labels was calculated. This loss was used to adjust the model's parameters through backpropagation and gradient descent.
We repeated this process over 10 epochs, with the learning rate managed by the Adam optimizer. To keep track of the training progress, we implemented a condition to print the loss every 10 batches.
In addition to these tools, we used Streamlit to create a user-friendly web interface for Image2Nutrients. Streamlit allowed us to design a simple yet effective UI where users can upload images of their food and receive detailed nutritional information.
Furthermore, we employed GitHub for version control and code management. It helped us maintain a systematic record of different versions of our project, track changes, and collaborate effectively.
Through a combination of sophisticated machine learning techniques, meticulous data preparation, and user-centered design, we were able to build Image2Nutrients - a tool that provides valuable nutritional insights based solely on images.
Challenges we ran into
The journey of creating Image2Nutrients was not without its fair share of hurdles and complexities. One of the primary challenges was the fine-tuning and training of our model. Given the sophistication of the VisionEncoderDecoderModel and the intricate nature of our task, achieving a high level of accuracy in ingredient detection was a significant challenge. This was further amplified by the fact that we were working with a diverse array of food images, each with its unique set of ingredients.
The process was time-consuming and resource-intensive, requiring numerous hours of training on Google Colab Pro. It also imposed a significant financial cost, which added another layer of complexity to our task.
Additionally, hosting our application presented its own set of challenges. We chose Streamlit for its simplicity and user-friendly interface, but we quickly found ourselves grappling with its storage limitations. Streamlit only offers 1GB of storage, which was insufficient for our fine-tuned model. We attempted to host our model on various cloud platforms, including Google Drive and Box, but none of these solutions proved feasible due to compatibility and space constraints.
This storage issue demanded a substantial amount of time and effort as we sought a solution that would allow us to host our model without sacrificing its performance or the functionality of our application.
Despite these challenges, we persevered and continued to refine and improve Image2Nutrients. Through trial and error, we learned valuable lessons about model training, cloud hosting, and application development, and these experiences have only strengthened our resolve and sharpened our skills.
Accomplishments that we're proud of
We take immense pride in the successful creation of Image2Nutrients, a tool that has the potential to revolutionize the way people understand and interact with their food. Our application, built on the cutting-edge technology of machine learning and AI, not only identifies ingredients in food images but also provides a comprehensive nutritional breakdown. This feat alone represents a significant stride in the intersection of AI and nutrition.
Moreover, overcoming the challenges related to model training and hosting in a restricted environment such as Streamlit reaffirms our capabilities to navigate complex issues and find innovative solutions. The fact that we were able to develop a working, user-friendly application despite these hurdles is something we're truly proud of.
What we learned
This project has been a profound learning journey. On the technical front, we delved deep into the nuances of training machine learning models, understanding the intricacies of fine-tuning the VisionEncoderDecoderModel, and working with data extraction and tokenization. We also gained valuable insights into handling the challenges of hosting AI models on the cloud, specifically within Streamlit's environment.
On a broader level, we learned about the potential of AI in transforming human health and nutrition. We realized the power of AI to deliver tangible, beneficial impacts, such as aiding individuals in making more informed, health-centric dietary choices.
What's next for Image2Nutrients
Looking ahead, we envision a vibrant future for Image2Nutrients. Our immediate focus is on enhancing the precision of our ingredient identification and nutritional analysis even further. We plan to fine-tune our model with additional data and introduce more nuanced parameters for analysis.
Beyond that, we aim to include personalized dietary recommendations, considering factors like age, gender, lifestyle, and health conditions. We believe that this will transform Image2Nutrients into an essential tool for personalized nutrition, encouraging healthier eating habits, and ultimately, fostering a healthier society globally.
In the long run, we are committed to consistently updating and improving Image2Nutrients. We aspire to make it an even more robust and comprehensive tool, helping individuals worldwide understand their food better and make healthier dietary choices. We believe that Image2Nutrients is just the beginning of a revolution in nutritional understanding, and we are excited to be at the forefront of this change.
Built With
- autotokenizer
- customdataset
- github
- googlecolabpro
- machine-learning
- matplotlib
- natural-language-processing
- openaigpt-4
- opencv
- pandas
- pillow
- python
- pytorch
- streamlit
- visionencoderdecodermodel(huggingface)
- visual-studio
- vit(visiontransformer)
Log in or sign up for Devpost to join the conversation.