Narad

What Inspired Us

India’s news ecosystem is inherently complex—characterized by high volume, linguistic diversity, and fragmented distribution channels. This environment makes it increasingly difficult to track coherent narratives, while also leaving the system vulnerable to coordinated misinformation. Critical grassroots developments are often overshadowed, and long-term accountability is weakened as public memory resets with each news cycle.

Narad was conceptualized to address this structural gap. It serves as a persistent intelligence layer over the news ecosystem—designed to preserve context, connect events across time and geography, and surface meaningful insights. The platform aims to function as an enduring memory system for Bharat, enabling early detection of emerging risks, identifying propagandist patterns, and strengthening transparency in public discourse.

How We Built It

Narad operates as a high-performance event intelligence engine that integrates deterministic mathematical systems with Generative AI, ensuring both scalability and interpretability.

The Mathematical Intelligence Layer

Incoming multilingual articles are processed through custom NLP pipelines and transformed into high-dimensional semantic embeddings. These embeddings enable a unified representation of content across languages, preserving intent rather than relying on translation.

To identify related events, we employ density-based clustering techniques combined with cosine similarity to measure semantic proximity:

$$ \text{Similarity}(A, B) = \frac{\mathbf{A} \cdot \mathbf{B}}{|\mathbf{A}||\mathbf{B}|} $$

This allows Narad to autonomously group dispersed reports into coherent event clusters, even when they originate from different linguistic or regional sources.

Causal Chain Detection

To understand how localized developments evolve into broader narratives, Narad models the news ecosystem as a dynamic graph. We perform multi-hop traversal using Breadth-First Search (BFS) to uncover relationships between events across time and geography.

To maintain precision and avoid spurious correlations, we apply an exponential decay function that penalizes weaker or distant connections:

$$ S_{chain} = \min(L_{1}, L_{2}, \dots, L_{n}) \times e^{-\lambda n} $$

This ensures that only high-confidence causal pathways are retained, enabling reliable mapping of how local signals propagate into national-level impact.

Generative AI Layer

Once the dataset is structured and filtered through deterministic processes, Generative AI is applied as a final interpretive layer. Instead of relying on LLMs for discovery, Narad uses them to translate structured data into human-readable outputs such as:

Fact Sheets
Bias and narrative analysis
Predictive state-level briefings

This separation of responsibilities ensures both computational efficiency and factual consistency.

Challenges We Faced

High-Dimensional Memory Constraints

As the dataset scaled to tens of thousands of articles, performing real-time spatial searches over dense vector representations introduced significant memory bottlenecks. This led to instability during concurrent query execution.

Addressing this required deep optimization of the vector indexing and retrieval pipeline, ensuring that high-dimensional searches could be executed efficiently without compromising system reliability.

Generative AI Cost-to-Performance Trade-offs

Early experimentation with LLM-driven pattern discovery revealed critical limitations in both latency and cost. Passing large-scale datasets directly into generative models proved impractical for real-time systems.

We resolved this by enforcing a strict architectural boundary: deterministic systems handle search, filtering, and relationship mapping, while LLMs are limited to summarization and interpretation. This significantly improved performance while maintaining scalability.

Temporal Modeling and Long-Term Context

Initial temporal scoring mechanisms heavily prioritized recency, using decay functions such as:

$$ e^{-\Delta t / 72} $$

While effective for highlighting recent events, this approach failed to support long-term narrative tracking and accountability analysis.

To overcome this, we redesigned the temporal data model to support persistent cross-year connections, enabling the system to track narrative evolution over extended periods.

What We Learned

Mathematical Systems Provide Scalable Foundations

Deterministic approaches to clustering and relationship modeling proved significantly more scalable and reliable than relying on generative models for structural understanding. Mathematical rigor ensures consistency, reproducibility, and efficiency at scale.

Data Architecture Is Central to Intelligence Systems

The transition from prototype to production highlighted the importance of storage and retrieval design. Handling large-scale vector data in real time requires carefully engineered database systems, as in-memory approaches do not scale reliably in production environments.

Multilingual Understanding Requires Semantic Alignment

Rather than relying on translation pipelines, building shared semantic vector spaces across languages enabled Narad to capture intent directly. This approach proved essential for accurately representing the diversity of Bharat’s information landscape while minimizing loss of meaning.

Built With

amazon-bedrock
amplify
aws-(app-runner
faiss
fastapi
next.js
postgresql-(pgvector)
python
rds
s3)
spacy
sql
tailwind-css
typescript

Updates

Ayush Gourav started this project — Apr 12, 2026 01:25 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.