MethaneGPT-Chat

Inspiration

Methane, a potent greenhouse gas, significantly contributes to global warming, with over 80 times the warming power of carbon dioxide over a 20-year period. Tackling methane emissions is a critical step in mitigating climate change. The livestock sector accounts for a substantial share of methane emissions. While By adopting green farming practices and sustainable cattle-rearing techniques, these emissions can be effectively reduced, the farmers on the wide scale doesn't have much knowledge and direction about this issue which presents the opportunity of MethaneGPT to be the one stop easy to access solution to all the queries and spread awareness. This project is inspired by the need to empower farmers with knowledge and tools, fostering awareness and sustainable solutions to address this pressing global issue.

What it does

The RAG leverages data from a diverse range of authoritative sources, including global institutions like the UNFCCC, World Bank, Cornell University, and their extensive work on methane reduction (M3) projects, carbon credit initiatives, and government policies. By synthesizing this wealth of information, the RAG delivers clear, data-driven, and fact-based insights tailored to address a wide variety of queries. It empowers users with accurate, actionable knowledge, enabling informed decision-making and effective implementation of sustainable practices.

How we built it

The project began with an extensive effort to source relevant and high-quality documents. Key platforms, including institutions focused on sustainability and methane (M3) reduction, government websites, and books, were explored to gather valuable PDF data. This data was systematically stored in an AWS S3 bucket for streamlined access.

A Snowflake stage was then created to interface with the files stored in the S3 bucket, enabling efficient data retrieval. The core functionality of the system relies on the snowflake.cortex.complete function, which processes the data and delivers outputs based on the most relevant document chunks.

The initial step involved extracting and storing all textual content from the PDFs. Various chunking strategies were explored to optimize the retrieval of contextually relevant data. Ultimately, a page-wise chunking approach was implemented, where each chunk contains all the text from a single page.

For context retrieval, Snowflake's LLM capabilities were thoroughly evaluated, and the VECTOR_L2_DISTANCE() function was selected as the optimal method. This function orders and selects chunks by identifying those with the minimum distance to the input query, ensuring highly relevant and accurate contextual information is provided to the language model.

Challenges we ran into

One of the primary challenges was sourcing comprehensive PDF documents that addressed all aspects of methane reduction (M3), carbon footprints, and sustainability. This process was both time-consuming and resource-intensive, requiring extensive research across multiple platforms and institutions.

Another significant challenge was mastering the Snowflake platform. Understanding its functionalities, exploring its features, and integrating them effectively into the project required a steep learning curve.

The most critical hurdle, however, was optimizing the performance of the Retrieval-Augmented Generation (RAG) system. Achieving satisfactory results involved numerous trials, iterations, and refinements to improve its accuracy and relevance beyond the baseline capabilities of the underlying language model. These efforts ultimately led to a robust and high-performing system.

Accomplishments that we're proud of

MethaneGPT delivers highly accurate, data- and fact-driven results, providing clear and insightful answers to user queries. The system effectively reduces ambiguity, ensuring that users receive precise and actionable information. Another key achievement is the optimization of query retrieval time. Despite incorporating additional data processing and retrieval mechanisms, the system matches the speed of the base LLM model, demonstrating its efficiency without compromising performance or accuracy.

What we learned

learning and fine tuning RAG to meet practical applications was a great learning point in this project. The skills like advanced SQL were honed again in this process of creating a practical application.

What's next for MethaneGPT-Chat

The project aims to achieve practical applications by continuously updating the database to provide users with the latest and most relevant information. Future milestones include integrating additional agents and implementing web driver automation to further enhance the system's functionality and usability.

Built With

cortex.complete
cortex.embed-text
mistral-large2
python
sql
streamlit

Updates

Tanay K started this project — Jan 21, 2025 06:38 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.