Inspiration

I'm the founder of a sustainability tech startup that uses heavy material science knowledge. I myself am a software engineer-turned-material scientist. Our startup is focused on creating raw material solutions from end-of-life waste plastic. We began with producing a carbon nano-material called Graphene from high density plastics, but then soon realized that selling Graphene itself might not be a long-term financially sustainable solution for the startup, although it still helped us to create tangible environmental output in terms of amount of emissions reduced due to reduction of accumulated waste.

Therefore we decided to make an end-product out of this graphene, specifically to substitute PFAS based products. PFAS are forever chemicals, which doesn't ever degrade in the natural environment and majority of items that we use ranging from raincoats to kitchen utensils to vehicles have PFAS in them.

We put our scientist hats on and started tinkering around with molecular combinations that give an equivalent performance compared to PFAS but are also biodegradable. This is when we realized that the combinations theoretically possible are limitless and that it would turn super tedious for our small startup team to zero in on candidates that can actually perform. Furthermore, it requires a lot of money to synthesize and test each of these molecular combinations.

That is when I as the founder started tinkering around with AI to accelerate this process. But this too soon hit a roadblock. In Material Sciences, the amount of data readily available is scarce and practitioners often have to mine from 100s of good quality peer reviewed papers! Without the required amount of data, the model would just output gibberish molecular predictions which would defeat the original purpose of lessening the time and money it takes to find the right candidate.

This is how MatInformatics was born. I used Gemini's APIs (including the deep research agent) to automate this process of gathering datasets from research papers and doing simulations via python to make the model output robust and somewhat deterministic for crucial fields such as Material Sciences.

What it does

MatInformatics accepts a user query that could be an analysis statement or a hypothesis regarding a new molecular combination or anything that the material scientist wants to do during first hypothesis validation. For example, a user can type something like "Analyze the biodegradability quotient of an rGO-stearic acid ester combination for hydrophobic coatings".

MatInformatics then spawns a couple of agents to breakdown the high level tasks into digestible smaller chunks for each of these agents. Some agent examples would be - Research Gatherer, Research Summarizer, Data Maker, DevOps agent that sets up the simulation environment and Research Document maker.

These agents work in parallel to complete the workflow and provide the users with validation under 5 minutes.

The users can increase the depth of validation by toggling on the Deep Research Agent or even have diagrammatic visuals in their end report by toggling on the Nano Banana agent.

How we built it

We used Google Ai Studio to vibe code everything. We provided the domain knowledge related guidance to the model and let it handle the rest of programming related knowledge.

Challenges we ran into

1) ArXiv was the only open site that provided APIs to fetch the papers. But there are better sources although these don't have APIs as of today. So we couldn't make the system robust enough to fetch the most peer reviewed or higher scopus ranked papers. 2) ArXiv paper fetching ran into CORS challenges, which was eventually fixed after 3-4 iterations, 3) The in-browser python environment faced some limitations in terms of what all packages can be installed, so we couldn't put all the features to the simulation that we desired.

Accomplishments that we're proud of

1) An internal AI system that we could build by ourselves without having to rely on external engineers or an agency, and which is helping us accelerate our product development and decreasing the time to market.

What we learned

Create science heavy domain applications using AI models are challenging due to either the limited availability of data or the limited variance in existing datasets. AI like Gemini can help to offset a lot of these challenges via agentic capabilities. These agentic LLM models can ease the workflow for simple neural network models that are enough for empirical molecular hypothesis validations.

What's next for MatInformatics.AI

1) Expand the system to include validations for quality of paper fetched. 2) Explore Interactions API to get papers from sources which doesn't have an API. 3) Open it to other material science researchers and gather feedback from them so that we can refine it. 4) Perhaps include this too as a digital product offering or material consultancy from our startup.

Built With

Share this project:

Updates