Rose, Lotus, Lily, Sunflower, Daisy, Iris (looking at my fellow ML people :p)... Is that all?
Our mother nature has blessed us with an extremely wide variety of flowers and plants, 40K to be exact, some of which reside in our gardens while others in forests. However, it is not always possible to identify a species simply by its look
The inspiration behind the project was to provide a platform for all the nature and plant lovers as well as those with a curious bent of mind, where they can get information about any plant or flower from it's picture, even when they don't have a name!
What it does
The website provides an interface to the user for uploading a picture of any plant or flower and with the click of the button, it identifies the species present in the image and provide information about the same! The information is scraped from internet sources like Wikipedia.
We have also provided a catalog of 103 flowers, which the user can easily browse and gain knowledge about various different species of flora present on our Earth!
How we built it
The complete development process can be summarized into 4 phases:
Phase 1: The first part included using the Kaggle flower dataset to train a CNN model (with EfficientNet B0 baseline) with Tensorflow. We were able to achieve an accuracy of ~96% on the test dataset. The trained model was then converted to TFLite for easy handling and faster inference.
Phase 2: Now, we used the MediaWiki API to scrape information about all the plants and flower species from Wikipedia and used an SQLite database to store the info. The complete process was done with Python.
Phase 3: After having our model and database at hand, now was the time to create the complete web application with Streamlit. It is a great library which allows making web apps with python! The app logo, workflow and the written information was drafted and finalized.
Phase 4: Although the web app was completed in itself, I decided to deploy it on a webste to make it more presentable and easy to access. For this purpose, the app was first deployed on Streamlit Share and then on the GoDaddy domain with the help of Github Pages.
Challenges we ran into
Two of the biggest challenges that I faced while working on the project were:
The most time-consuming and difficult part was to scrape information about the various species due to the fact that the MediaWiki API expects the name of the species as present in it's Wikipedia page URL. Now, this posed a problem as I was having the common names of all flowers. So, I had to write a custom algorithm to search the common name on Google, find its Wikipedia link and extract its name from there.
Another challenge was to deploy the dynamic app using Github pages on the custom domain. Since, GH Pages allow only static websites. I solved this by integrating an
iframeinside of a blank page on the domain.
Accomplishments that we're proud of
- I was able to create the complete project just like I originally had it in mind, right from training the model to deploying it on my website.
- Created quite a robust algorithm for scraping the google search results to find the Wikipedia link for any flower (Since, some flowers shared name with movies, songs etc.)
- I got and deployed my very first website!
What we learned
- How to efficiently use web scraping and MediaWiki API
- Create a complete interactive web app with Streamlit
- Hosting and deploying a website on a custom domain with Github Pages
What's next for Who's The Little Guy?
As of now, the website identifies 103 species of various plants and flowers. This can be increased substantially by using a bigger dataset for training the model.