Inspiration
Research today is increasingly interdisciplinary. A neuroscientist reading a paper on graph theory might spark a breakthrough in brain connectivity modeling, yet these cross-domain insights are easily lost in scattered notes. We observed that the core challenge lies in the gap between collecting knowledge and connecting it.
Traditional note-taking tools treat each entry as an isolated document. But knowledge is not flat — it forms a graph $G = (V, E)$, where concepts $V$ are linked by typed relationships $E$. We asked: What if AI could automatically extract this hidden structure and, more importantly, propose novel hypotheses at the intersection of disciplines?
This question became KnowledgeBridge AI — a tool that transforms fragmented research notes into a living ontology map $O = (C, R, \mathcal{H}, \mathcal{A})$, where $C$ represents core concepts, $R$ the relationships between them, $\mathcal{H}$ the disciplinary hierarchy, and $\mathcal{A}$ the generated axioms (hypotheses).
What it does
KnowledgeBridge AI is an interdisciplinary research navigator that bridges Academia and Business by extracting structured ontologies from unstructured text.
Core Pipeline:
- Input — Users provide research notes via text, file upload (PDF, images, markdown), or real-time voice input using the Web Speech API.
- Document Understanding — Non-text documents (handwritten notes, diagrams, formulas) are analyzed via Gemini 3 Flash's multimodal capabilities and converted to text.
- Ontology Extraction — Gemini 3 Pro (with Thinking Mode, budget $= 32768$ tokens) parses the input and extracts:
- Main concepts with importance scoring $i \in {\text{high}, \text{medium}, \text{low}}$
- Disciplines with confidence $c \in [0, 1]$
- Typed relationships: $r \in {\text{causes}, \text{enables}, \text{requires}, \text{contradicts}, \text{extends}, \text{applies_to}}$
- Interdisciplinary connections with novelty ratings
- Hypothesis Generation — When multiple disciplines are detected ($|\text{disciplines}| > 1$), the system automatically generates a novel research hypothesis bridging the fields.
- Google Search Grounding — Optionally enriches the analysis with up-to-date web sources via Gemini's native Google Search tool, attaching verified references.
- Interactive Ontology Graph — Results are visualized as an interactive SVG-based knowledge graph with:
- Drag-and-drop node repositioning
- Zoom and pan controls
- Concept search with neighbor highlighting
- Color-coded nodes: importance (red/amber/green) and disciplines (blue)
- TTS Readback — Summaries and hypotheses can be read aloud via Gemini 2.5 Flash TTS (24kHz PCM output), with Web Speech API fallback.
- Notion Export — Results are saved to Notion databases with configurable property mapping (tags, intent, disciplines, interdisciplinary flag), supporting presets for repeat workflows.
How we built it
Frontend Architecture:
- React 19 + TypeScript with Vite 6 for fast HMR and optimized builds
- Tailwind CSS for responsive, utility-first styling
- Pure SVG rendering for the ontology graph (no external charting libraries)
AI Backend (Serverless):
- Google Gemini API via @google/genai SDK:
- gemini-3-pro-preview — Deep ontology extraction with thinkingConfig: { thinkingBudget: 32768 }
- gemini-3-flash-preview — Document OCR, summarization, and search-grounded analysis with tools: [{ googleSearch: {} }]
- gemini-2.5-flash-preview-tts — Neural TTS with Kore voice, outputting raw PCM at 24kHz
- Structured JSON output enforced via responseMimeType: 'application/json' with a detailed system instruction schema
- Retry logic with exponential backoff ($\text{delay} = 1000 \times \text{attempt} ; \text{ms}$)
Browser APIs:
- Web Speech API — Real-time speech-to-text with continuous mode and interim results
- Web Audio API — PCM decoding, AudioContext management, and volume analysis via AnalyserNode (FFT size $= 256$)
- MediaDevices API — Microphone access with echo cancellation and noise suppression
Integration:
- Notion API (v2022-06-28) — Page creation with dynamic property mapping, accessed via CORS proxy for client-side calls
Challenges we ran into
- Gemini TTS PCM Decoding — Gemini 2.5 Flash TTS returns raw base64-encoded PCM (16-bit signed integer, mono, 24kHz) rather than a standard audio format. We had to manually decode the base64 payload, interpret the Int16Array, and normalize samples to $[-1, 1]$ float range via $x_{\text{float}} = \frac{x_{\text{int16}}}{32768}$ before creating an AudioBuffer.
- Structured JSON Reliability — Despite setting responseMimeType: 'application/json', the model occasionally wraps output in markdown code blocks. We implemented a multi-stage JSON extraction pipeline: direct parse $\rightarrow$ code block regex $\rightarrow$ brace matching, ensuring robust parsing across edge cases.
- CORS Restrictions for Notion API — Since this is a pure client-side application with no backend server, calling the Notion API directly from the browser is blocked by CORS. We solved this by routing requests through a CORS proxy, but this introduced latency and reliability concerns that required timeout management and error handling.
- Graph Layout Without D3 — We built the interactive ontology graph from scratch using pure SVG and React state, without relying on D3.js or similar libraries. Implementing smooth zoom/pan with correct coordinate transformations (screen $\rightarrow$ SVG $\rightarrow$ graph space via CTM.inverse()) was non-trivial.
Accomplishments that we're proud of
- Zero-backend architecture — The entire application runs client-side with no server infrastructure, yet delivers a full AI-powered research workflow including ontology extraction, TTS, document analysis, and database export.
- Automatic hypothesis generation — The system doesn't just organize knowledge — it creates new knowledge by detecting interdisciplinary connections and proposing testable hypotheses. This transforms a passive note-taking tool into an active research partner.
- Search Grounding integration — By leveraging Gemini's native Google Search tool, analyses are enriched with real-world references, bridging the gap between a user's notes and the broader research landscape.
- Three Gemini models in harmony — We orchestrated Pro (deep thinking), Flash (speed + multimodal), and Flash TTS (audio) into a cohesive pipeline, selecting the right model for each task: $\text{Pro} \rightarrow \text{ontology}$, $\text{Flash} \rightarrow \text{OCR + search}$, $\text{Flash TTS} \rightarrow \text{audio}$.
- Interactive graph with pure SVG — No external graph library was needed. The ontology visualization supports drag, zoom, pan, and search — all built with React state and SVG primitives.
What we learned
- Thinking Mode is transformative for structured extraction — Giving the model a dedicated thinking budget dramatically improved the quality of ontology extraction, especially for identifying subtle interdisciplinary connections that surface-level analysis would miss.
- Google Search Grounding changes the game — Rather than relying solely on the model's parametric knowledge, grounding with live search results produces verifiable, citation-backed analysis. The groundingMetadata.groundingChunks API provides structured source attribution for free.
- Raw PCM is the future of low-latency TTS — While standard audio formats (MP3, WAV) require container parsing, raw PCM from Gemini TTS can be decoded and played with minimal overhead using the Web Audio API, enabling near-instant playback.
- Client-side AI apps are viable — With modern APIs like Gemini's JS SDK, it's possible to build sophisticated AI applications entirely in the browser. The trade-off is API key exposure, which must be mitigated through key restrictions or a thin proxy layer for production use.
What's next for KnowledgeBridge AI
- Persistent Knowledge Graph — Store ontologies across sessions in a graph database (e.g., Neo4j) to build a cumulative, queryable knowledge network where $G_t = G_{t-1} \cup G_{\text{new}}$.
- Multi-document Synthesis — Enable batch analysis of multiple papers to automatically discover cross-paper connections, generating a meta-ontology $O_{\text{meta}} = \bigcup_{i=1}^{n} O_i$ with emergent interdisciplinary links.
- Citation Graph Integration — Connect with Semantic Scholar or OpenAlex APIs to enrich ontologies with citation networks, impact factors, and related work suggestions.
- Fine-tuned Ontology Models — Train domain-specific adapters on curated ontology datasets to improve extraction accuracy for specialized fields (biomedical, legal, financial).
Built With
- gemini-2.5-flash-tts
- gemini-3-flash
- gemini-3-pro-(thinking-mode)
- google-ai-studio
- google-cloud
- google-gemini-api
- google-search-grounding
- notion-api
- react-19
- svg
- tailwind-css
- typescript
- vite-6
- web-audio-api
- web-speech-api
Log in or sign up for Devpost to join the conversation.