wiki-in-a-box

## Inspiration

We were inspired by Internet-in-a-Box (IIAB) - an open-source project that brings offline access to Wikipedia and other educational resources in disconnected environments like rural schools, disaster zones, and remote camps.

But IIAB is just content. We wanted to add reasoning.

Wikipedia + GPT - in a box, fully offline.

## What it does

Wiki-in-a-Box is a fully offline Retrieval-Augmented Generation (RAG) system over a local Wikipedia ZIM file.
It runs on your machine - no cloud, no APIs - and answers natural language questions by:

Searching Wikipedia locally using title and full-text heuristics
Ranking relevant sections with local semantic embeddings
Generating an answer using a local gpt-oss:20b model via Ollama
Streaming the result with inline [1]-style citations
Linking those citations to local Kiwix-served pages

All of this happens inside a Docker stack, after a one-time download of models and data.

## How we built it

Used python-libzim to read and search a Wikipedia .zim file
Built an in-memory title suggestion + reranking pipeline using:
- Query tokenization, title prefix matches, SQLite FTS5
- SentenceTransformers (bge-small-en-v1.5) for semantic lead scoring
Created section-aware chunking via BeautifulSoup (h2/h3 splits)
Used cosine similarity to pick best-matching chunks
Constructed compact citation-style context:
"[1] Sunset - Rayleigh scattering causes orange hues at dusk."
Connected to Ollama running gpt-oss:20b on localhost
Built a FastAPI backend that streams answers via SSE
Served a static frontend that renders tokens and citation chips
Wrapped Kiwix + Nginx to open /kiwix/... citation pages locally

## Challenges we ran into

Embedding model cold starts: SentenceTransformers was slow to load inside Docker; we solved it with offline Hugging Face cache prefetching.
Title recall tuning: Getting strong hits for fuzzy user queries required combining libzim.SuggestionSearcherwith a custom FTS5 title index.

## Accomplishments that we're proud of

End-to-end RAG pipeline with local LLM, local Wikipedia, and local citations - completely offline
No dependency on cloud APIs, GPU inference services, or commercial products.
It generalizes beyond Wikipedia: swap in any ZIM (e.g., survival manuals, medical docs) and the system just works.

## What we learned

Retrieval + chunking + compact context design is just as important as the LLM itself - garbage in, garbage out.
Embedding models are extremely lightweight compared to LLMs - perfect for on-device intelligence