Themisia

Themisia Landing Page

Inspiration

As a final-year student, I often find myself overwhelmed by the sheer number of research papers I need to read for my thesis. The challenge isn't just in the volume of papers but also in understanding the complex language and concepts within them. This problem isn't unique to me. Many students, researchers, and professionals face similar difficulties when dealing with intricate documents such as medical reports, legal contracts, or technical papers. The frustration of not being able to easily comprehend or track the documents we've read inspired me to create a solution.

What it does

Themisia is a web application designed to simplify how we interact with complex documents. By combining the capabilities of Large Language Models (LLMs) with Vector Search powered by TiDB, Themisia enables users to "talk" to their documents. Whether it's a research paper, a legal document, or any other complex text, Themisia helps users understand and navigate through the content efficiently.

How we built it

Themisia was built using Next.js as a full-stack web application framework. For our primary database, we chose TiDB Serverless due to its vector search capabilities, MySQL compatibility, and cost-effectiveness. The process begins with the user uploading a PDF document, from which the text is extracted and divided into smaller chunks. These chunks are then passed through the Gemini embeddings model to generate vector embeddings.

We store these embeddings along with their metadata in TiDB. Additionally, each page of the PDF is converted into an image and stored in Google Cloud Storage to provide visual context, which is particularly useful for advanced LLM vision capabilities. The user can then ask questions about the document, and Themisia retrieves relevant information using the vector embeddings, ensuring that the AI's responses are accurate and based on the document's content.

Challenges we ran into

One of the significant challenges we faced was building a user-friendly interface. It required a lot of iteration and testing to ensure the design was both intuitive and mobile-responsive. Another challenge was the text extraction process from PDFs, which sometimes encountered bugs due to issues with the pdfjs-dist package. Additionally, chunking the extracted text in a meaningful way to convert it into embeddings was a complex task. We also had to manage AI behavior, such as controlling unexpected response lengths and mitigating hallucinations.

Accomplishments that we're proud of

Successfully Integrated Vector Search and AI: We combined the power of Large Language Models (LLMs) with Vector Search using TiDB Serverless, creating a unique application that allows users to interact with their documents in an intelligent and meaningful way.
User-Centric Design: We developed a simple, intuitive, and mobile-responsive user interface that enhances the user experience, making it easy for anyone to navigate and use Themisia.
Efficient Document Processing: Despite the challenges, we managed to build a reliable system that extracts text from PDFs, processes it into vector embeddings, and uses these embeddings to accurately answer user queries.
Minimizing AI Hallucinations: We implemented techniques like Retrieval Augmented Generation (RAG) to reduce AI hallucinations, ensuring that the AI provides responses strictly based on the content of the uploaded documents.
Scalable and Cost-Effective Infrastructure: By leveraging TiDB Serverless, we created a scalable backend that automatically adjusts to workload changes, ensuring that the application can handle varying levels of demand without unnecessary costs.
Advanced Visual Context Integration: We incorporated LLM vision capabilities by converting PDF pages into images and storing them in Google Cloud Storage, providing additional context for the AI and enhancing the overall document interaction experience.

What we learned

Building Themisia was a learning journey that taught me the importance of focusing on details. From ensuring that each line of code is clear and functional to designing an intuitive user interface, every aspect required careful consideration. I also learned about the power of AI and vector search in enhancing the user experience and the complexities of integrating these technologies.

What's next for Themisia

Expanding AI Capabilities: We plan to enhance Themisia's AI by integrating more advanced language models and fine-tuning them for specific document types, such as legal contracts, medical reports, and technical papers.
Multilingual Support: To make Themisia accessible to a broader audience, we aim to add support for multiple languages, enabling users from different linguistic backgrounds to interact with their documents in their native language.
Enhanced Document Management: We plan to introduce more advanced document management features, such as version control, tagging, and more powerful search options, to help users organize and retrieve their documents more efficiently.
Continuous Optimization: We will continue to optimize Themisia's performance, focusing on improving processing speed, reducing latency in AI responses, and enhancing the overall user experience.