Inspiration

Research today is increasingly interdisciplinary. A neuroscientist reading a paper on graph theory might spark a breakthrough in brain connectivity modeling, yet these cross-domain insights are easily lost in scattered notes. We observed that the core challenge lies in the gap between collecting knowledge and connecting it.

Traditional note-taking tools treat each entry as an isolated document. But knowledge is not flat — it forms a graph $G = (V, E)$, where concepts $V$ are linked by typed relationships $E$. We asked: What if AI could automatically extract this hidden structure and, more importantly, propose novel hypotheses at the intersection of disciplines?

This question became KnowledgeBridge AI — a tool that transforms fragmented research notes into a living ontology map $O = (C, R, \mathcal{H}, \mathcal{A})$, where $C$ represents core concepts, $R$ the relationships between them, $\mathcal{H}$ the disciplinary hierarchy, and $\mathcal{A}$ the generated axioms (hypotheses).

What it does

KnowledgeBridge AI is an interdisciplinary research navigator that bridges Academia and Business by extracting structured ontologies from unstructured text.

Core Pipeline:

  1. Input — Users provide research notes via text, file upload (PDF, images, markdown), or real-time voice input using the Web Speech API.
  2. Document Understanding — Non-text documents (handwritten notes, diagrams, formulas) are analyzed via Gemini 3 Flash's multimodal capabilities and converted to text.
  3. Ontology Extraction — Gemini 3 Pro (with Thinking Mode, budget $= 32768$ tokens) parses the input and extracts:
    • Main concepts with importance scoring $i \in {\text{high}, \text{medium}, \text{low}}$
    • Disciplines with confidence $c \in [0, 1]$
    • Typed relationships: $r \in {\text{causes}, \text{enables}, \text{requires}, \text{contradicts}, \text{extends}, \text{applies_to}}$
    • Interdisciplinary connections with novelty ratings
  4. Hypothesis Generation — When multiple disciplines are detected ($|\text{disciplines}| > 1$), the system automatically generates a novel research hypothesis bridging the fields.
  5. Google Search Grounding — Optionally enriches the analysis with up-to-date web sources via Gemini's native Google Search tool, attaching verified references.
  6. Interactive Ontology Graph — Results are visualized as an interactive SVG-based knowledge graph with:
    • Drag-and-drop node repositioning
    • Zoom and pan controls
    • Concept search with neighbor highlighting
    • Color-coded nodes: importance (red/amber/green) and disciplines (blue)
  7. TTS Readback — Summaries and hypotheses can be read aloud via Gemini 2.5 Flash TTS (24kHz PCM output), with Web Speech API fallback.
  8. Notion Export — Results are saved to Notion databases with configurable property mapping (tags, intent, disciplines, interdisciplinary flag), supporting presets for repeat workflows.

How we built it

Frontend Architecture:

  • React 19 + TypeScript with Vite 6 for fast HMR and optimized builds
  • Tailwind CSS for responsive, utility-first styling
  • Pure SVG rendering for the ontology graph (no external charting libraries)

AI Backend (Serverless):

  • Google Gemini API via @google/genai SDK:
    • gemini-3-pro-preview — Deep ontology extraction with thinkingConfig: { thinkingBudget: 32768 }
    • gemini-3-flash-preview — Document OCR, summarization, and search-grounded analysis with tools: [{ googleSearch: {} }]
    • gemini-2.5-flash-preview-tts — Neural TTS with Kore voice, outputting raw PCM at 24kHz
  • Structured JSON output enforced via responseMimeType: 'application/json' with a detailed system instruction schema
  • Retry logic with exponential backoff ($\text{delay} = 1000 \times \text{attempt} ; \text{ms}$)

Browser APIs:

  • Web Speech API — Real-time speech-to-text with continuous mode and interim results
  • Web Audio API — PCM decoding, AudioContext management, and volume analysis via AnalyserNode (FFT size $= 256$)
  • MediaDevices API — Microphone access with echo cancellation and noise suppression

Integration:

  • Notion API (v2022-06-28) — Page creation with dynamic property mapping, accessed via CORS proxy for client-side calls

Challenges we ran into

  1. Gemini TTS PCM Decoding — Gemini 2.5 Flash TTS returns raw base64-encoded PCM (16-bit signed integer, mono, 24kHz) rather than a standard audio format. We had to manually decode the base64 payload, interpret the Int16Array, and normalize samples to $[-1, 1]$ float range via $x_{\text{float}} = \frac{x_{\text{int16}}}{32768}$ before creating an AudioBuffer.
  2. Structured JSON Reliability — Despite setting responseMimeType: 'application/json', the model occasionally wraps output in markdown code blocks. We implemented a multi-stage JSON extraction pipeline: direct parse $\rightarrow$ code block regex $\rightarrow$ brace matching, ensuring robust parsing across edge cases.
  3. CORS Restrictions for Notion API — Since this is a pure client-side application with no backend server, calling the Notion API directly from the browser is blocked by CORS. We solved this by routing requests through a CORS proxy, but this introduced latency and reliability concerns that required timeout management and error handling.
  4. Graph Layout Without D3 — We built the interactive ontology graph from scratch using pure SVG and React state, without relying on D3.js or similar libraries. Implementing smooth zoom/pan with correct coordinate transformations (screen $\rightarrow$ SVG $\rightarrow$ graph space via CTM.inverse()) was non-trivial.

Accomplishments that we're proud of

  • Zero-backend architecture — The entire application runs client-side with no server infrastructure, yet delivers a full AI-powered research workflow including ontology extraction, TTS, document analysis, and database export.
  • Automatic hypothesis generation — The system doesn't just organize knowledge — it creates new knowledge by detecting interdisciplinary connections and proposing testable hypotheses. This transforms a passive note-taking tool into an active research partner.
  • Search Grounding integration — By leveraging Gemini's native Google Search tool, analyses are enriched with real-world references, bridging the gap between a user's notes and the broader research landscape.
  • Three Gemini models in harmony — We orchestrated Pro (deep thinking), Flash (speed + multimodal), and Flash TTS (audio) into a cohesive pipeline, selecting the right model for each task: $\text{Pro} \rightarrow \text{ontology}$, $\text{Flash} \rightarrow \text{OCR + search}$, $\text{Flash TTS} \rightarrow \text{audio}$.
  • Interactive graph with pure SVG — No external graph library was needed. The ontology visualization supports drag, zoom, pan, and search — all built with React state and SVG primitives.

What we learned

  • Thinking Mode is transformative for structured extraction — Giving the model a dedicated thinking budget dramatically improved the quality of ontology extraction, especially for identifying subtle interdisciplinary connections that surface-level analysis would miss.
    • Google Search Grounding changes the game — Rather than relying solely on the model's parametric knowledge, grounding with live search results produces verifiable, citation-backed analysis. The groundingMetadata.groundingChunks API provides structured source attribution for free.
    • Raw PCM is the future of low-latency TTS — While standard audio formats (MP3, WAV) require container parsing, raw PCM from Gemini TTS can be decoded and played with minimal overhead using the Web Audio API, enabling near-instant playback.
    • Client-side AI apps are viable — With modern APIs like Gemini's JS SDK, it's possible to build sophisticated AI applications entirely in the browser. The trade-off is API key exposure, which must be mitigated through key restrictions or a thin proxy layer for production use.

What's next for KnowledgeBridge AI

  • Persistent Knowledge Graph — Store ontologies across sessions in a graph database (e.g., Neo4j) to build a cumulative, queryable knowledge network where $G_t = G_{t-1} \cup G_{\text{new}}$.
  • Multi-document Synthesis — Enable batch analysis of multiple papers to automatically discover cross-paper connections, generating a meta-ontology $O_{\text{meta}} = \bigcup_{i=1}^{n} O_i$ with emergent interdisciplinary links.
  • Citation Graph Integration — Connect with Semantic Scholar or OpenAlex APIs to enrich ontologies with citation networks, impact factors, and related work suggestions.
  • Fine-tuned Ontology Models — Train domain-specific adapters on curated ontology datasets to improve extraction accuracy for specialized fields (biomedical, legal, financial).

Built With

  • gemini-2.5-flash-tts
  • gemini-3-flash
  • gemini-3-pro-(thinking-mode)
  • google-ai-studio
  • google-cloud
  • google-gemini-api
  • google-search-grounding
  • notion-api
  • react-19
  • svg
  • tailwind-css
  • typescript
  • vite-6
  • web-audio-api
  • web-speech-api
Share this project:

Updates