ChronoQuest: Bringing Archives to Life with AI & Interactive Simulations

Inspiration

The past holds invaluable lessons, but accessing historical and scientific archives is often difficult, requiring extensive research and domain knowledge. We wanted to create a tool that makes archival data interactive, engaging, and accessible—especially for students, researchers, and lifelong learners.

ChronoQuest transforms historical plant discoveries and archival newspaper records into an immersive, AI-powered experience that brings history to life! By combining machine learning, generative AI, and interactive timelines, we empower users to explore archives in a way never before possible.

What It Does

ChronoQuest is a gamified, interactive platform that allows users to:

  • Explore archival data dynamically – Witness plant discoveries unfold over time with an interactive timeline.
  • Engage with historical newspapers – Search and interact with Boston University’s Daily Free Press archives.
  • Talk to Your Archives – Meet Amanda! – Amanda, our AI research assistant, acts as a virtual librarian, helping users find relevant resources, suggest learning paths, and guide research through AI-powered Q&A.
  • Experience truly interactive archives – Through text-to-speech and video generation, Amanda makes archives feel alive, allowing users to converse with history in real-time.
  • Leverage AI-powered research tools – Using Python pandas for data mining, ChronoQuest enables users to analyze datasets via NLP queries, intelligently sourcing information with citations to prevent hallucinations.
  • Test your knowledge – Play quizzes designed to reinforce learning and make history more engaging!

By integrating geospatial, historical, and biological data, ChronoQuest transforms passive reading into an interactive learning experience that bridges the past with the future.

How We Built It

We leveraged cutting-edge AI and big data technologies to process, store, and serve the vast amounts of archival data:

Data Sources

  • Scraped Harvard Herbaria plant discovery data from GBIF.org
  • Extracted Boston University Daily Free Press archives using OCR
  • Processed thousands of historical records to create a structured knowledge base

Tech Stack

  • Databricks Llama Index – To efficiently search, retrieve, and structure archival data.
  • LanceDB – Storing and querying vector embeddings for semantic search.
  • Delta Lake – Managing massive datasets efficiently with ACID transactions.
  • MongoDB – Storing and indexing archival text data for fast retrieval.
  • Used lang chain for interactions that involved controlling and tool calling such as the research assistant
  • RAG augmented retrieval using databricks backed tools llamaindex and "lanceDb" for storing and retrieving embeddings
  • Generative AI (LLM + NLP) – Enabling AI-powered question answering and natural conversations with historical data.
  • Speech-to-Text & Text-to-Speech – Bringing voice to archives, allowing users to listen and interact.
  • OCR Processing – Extracting text from scanned newspaper archives for enhanced accessibility.
  • Python Pandas & AI Research Tools – Allowing users to mine data using NLP queries, ensuring accurate, hallucination-free results with proper citations.
  • AI video API for generating video which we then integrated with agents for the librarian
  • FastAPI and Flask for the backends, hosted on Render and locally using localhost.run port forwarding

Why ChronoQuest Matters

  • 📚 Education & Research: Historians, students, and scientists can easily explore archival data without needing extensive research skills. Students can access a gamified view
  • The virtual librarian that adds a friendly more interactive face that reduces the fear
  • 🗺️ Data Visualization: Our timeline makes history visual and engaging, showing how discoveries unfolded over time.
  • 🗣️ Conversational AI – Talk to Your Archives! Amanda, our AI research assistant, acts as a virtual librarian, helping users find resources, analyze data, and answer historical questions dynamically.
  • 🎮 Gamification: Our quizzes ensure users retain knowledge in a fun and interactive way.
  • 🌍 Democratizing Knowledge: Archival data shouldn’t be hidden in static PDFs! ChronoQuest makes publicly available data truly accessible.
  • Helping people learn to interact more with the data, our story tools encourage people to think more critically about the topics, and explore new topics

Challenges We Overcame

  • Processing massive datasets efficiently – We optimized queries using Databricks Delta Lake and LanceDB.
  • Making unstructured archives searchable – OCR + AI-driven embeddings allow semantic search and question-answering.
  • Ensuring a seamless, intuitive UX – Our interactive UI + speech integration enhances accessibility and engagement.
  • Building a research-grade AI assistant – We designed Amanda to prevent AI hallucinations, provide sources for data, and enhance trust in AI-powered research tools.

The Impact

ChronoQuest revolutionizes how we explore history and biodiversity by transforming archives into an interactive, AI-powered experience. By leveraging the latest in AI, databases, and gamification, we empower users to discover, learn, and engage with the past like never before.

We believe ChronoQuest has the potential to reshape digital education and make archival data more accessible than ever.

Built With

Share this project:

Updates