Kuma the Research Bear

Kuma presents a quick to understand "gist" of the paper at a glance.
Kuma generates guided explanation of images.
Kuma renders LaTeX and explains formulas used in the paper.
Reader may copy and paste formulas from the paper.
Kuma works with PDFs.
Reader discusses image with Kuma for further insights.
Kuma is grounded in the context of the paper and cites sources that readers can click to scroll to.
Kuma is multilingual and can explain papers in multiple languages. Currently only English, Spanish and Japanese are supported.
Kuma can generate a glossary of terms used in the context of the paper.
Kuma generates a TLDR summary with key points.

⚡Inspiration

Ever spent hours rereading the same paragraph in a research paper, trying to decode technical jargon and decipher complex visuals? Research remains inaccessible to most people because it's written in dense, highly technical language that's hard to absorb.

Kuma the Research Bear was built to change that—making research comprehension faster, clearer, and accessible to everyone, right inside Chrome.

🔑 What it does

Kuma the Research Bear runs directly inside Chrome using the Gemini Nano built-in APIs to process and contextualize research papers locally for privacy and speed. With its purpose-built UI, Kuma makes it easy to understand research papers, breaking down dense text and visuals into simplified explanations.

Key Features:

At a Glance
Quick summary and explanation of the research paper in plain language, including key points and details of the research topic, broken down into what (the paper is about), why (it matters), and how (research was done).
Visual Explainer
Kuma detects and explains images in an easy-to-understand format. This helps readers grasp what the visual is showing, why it's important in the context of the paper, and why it matters in general. Readers can take a deep dive into the visual's contents by asking Kuma questions and repeating concepts to validate understanding.
Dynamic Chat vs. Side Panel
With a dynamic chat that can be moved and adjusted to your preference, you may engage with the interactive chat as much or as little as you wish. Alternatively, the side panel is always available for those who prefer a structured view.
Deep Analysis
Detailed exploration into methodologies, confounders and biases, implications, and limitations stimulate intellectual engagement which creates a more comprehensive view of the research. The extension uses a Map-Reduce, summary of summaries pattern along with hybrid search to generate this. This is important in understanding the wider applications of the concept.
Source Citations
Generate citations directly in your research paper within Chrome, and either grab the whole list of citations generated or copy only the ones you need. Sources are listed throughout the extension, ensuring information is grounded in the context of the paper. Citations may be exported in 3 formats: APA, MLA, Chicago.
Multiple Languages
Kuma is capable of speaking multiple different languages! Currently, English, Spanish and Japanese are available.
All Local-AI (Chrome Built-in AI)
Kuma leverages Chrome's Built-in AI capabilities to make it such that any information generated never leaves your browser for maximum privacy.

UX Features:

The magic is in the details:

Feature	What it does
Kuma Compass	Visual indicator showing where explained images are relative to your current position—instant reorientation
Dynamic Chat	Adjustable transparency, minimize/expand, multi-tabbed interface for flexible workflows
Click-to-Navigate	Click images or sources to auto-scroll to that section in the paper
LaTeX Support	Ask for formulas and get properly rendered LaTeX equations
Screenshot & Discuss	Right-click to capture any region and discuss it with Kuma—works on PDFs and HTML
Progress Messaging	Always know what's happening; no mysterious loading states
Lottie Animations	Delightful animated Kuma mascot that brings personality to every interaction

How we built it

Chrome Built-in AI APIs

Prompt API – Primary intelligence engine
Summarizer API – Quick TLDR generation, Map-Reduce summary of summaries for analysis
Language Detector API – Multilingual support

AI/ML Stack

Transformers.js 3.1 – EmbeddingGemma (308M params) for semantic search
ONNX Runtime – WebGPU acceleration with WASM fallback
OkapiBM25 – Keyword search for hybrid RAG

Frontend Stack

Preact – Lightweight React alternative (10KB)
TypeScript – Full type safety
Tailwind CSS 4.1 – Modern utility-first styling
Vite – Fast builds with HMR
MathJax – LaTeX rendering
Lottie – Animated mascot
Marked + DOMPurify – Safe markdown rendering

🚧 Challenges & Solutions

Challenge	How we solved it
No built-in preemption in LanguageModel API	Broke longer tasks into smaller sequential chunks, allowing high-priority requests (chat) to "slip in" before background tasks (analysis). Future: implement pseudo-preemption with priority queuing
PDF viewer isolation preventing image detection	Built a "capture screen region" feature that works universally across PDFs and HTML pages
First time creating a chrome extension	Deep-dived documentation, leveraged community resources, and iterated rapidly
Managing token budgets across features	Implemented adaptive context trimming and conversation summarization to maximize efficiency
WebGPU compatibility variations	Built graceful WASM fallback with quantized models (only 3% precision loss)

🌟 Accomplishments that we're proud of

UX obsession paid off – Progress messaging, contextual menus, scroll-to-source, dynamic chat positioning, compass navigation, and more make Kuma genuinely delightful to use

Actually useful – We've used Kuma to understand dozens of papers during development. It works!

Universal compatibility – Supports both HTML papers and PDFs, covering the vast majority of research literature

Privacy-first architecture – 100% local processing means sensitive research stays on your device

Built-in AI showcase – Integrated all three Chrome AI APIs with advanced features like hybrid RAG, multimodal chat, and structured output

What we learned

Chrome extensions have massive untapped potential for AI-powered productivity tools
Built-in AI APIs enable privacy-first applications that were previously impossible/impractical
UX polish matters tremendously, small touches like the compass and click-to-scroll create magic
Progressive loading and streaming responses are essential for AI UX