What Inspired Us
India’s news ecosystem is inherently complex—characterized by high volume, linguistic diversity, and fragmented distribution channels. This environment makes it increasingly difficult to track coherent narratives, while also leaving the system vulnerable to coordinated misinformation. Critical grassroots developments are often overshadowed, and long-term accountability is weakened as public memory resets with each news cycle.
Narad was conceptualized to address this structural gap. It serves as a persistent intelligence layer over the news ecosystem—designed to preserve context, connect events across time and geography, and surface meaningful insights. The platform aims to function as an enduring memory system for Bharat, enabling early detection of emerging risks, identifying propagandist patterns, and strengthening transparency in public discourse.
How We Built It
Narad operates as a high-performance event intelligence engine that integrates deterministic mathematical systems with Generative AI, ensuring both scalability and interpretability.
The Mathematical Intelligence Layer
Incoming multilingual articles are processed through custom NLP pipelines and transformed into high-dimensional semantic embeddings. These embeddings enable a unified representation of content across languages, preserving intent rather than relying on translation.
To identify related events, we employ density-based clustering techniques combined with cosine similarity to measure semantic proximity:
$$ \text{Similarity}(A, B) = \frac{\mathbf{A} \cdot \mathbf{B}}{|\mathbf{A}||\mathbf{B}|} $$
This allows Narad to autonomously group dispersed reports into coherent event clusters, even when they originate from different linguistic or regional sources.
Causal Chain Detection
To understand how localized developments evolve into broader narratives, Narad models the news ecosystem as a dynamic graph. We perform multi-hop traversal using Breadth-First Search (BFS) to uncover relationships between events across time and geography.
To maintain precision and avoid spurious correlations, we apply an exponential decay function that penalizes weaker or distant connections:
$$ S_{chain} = \min(L_{1}, L_{2}, \dots, L_{n}) \times e^{-\lambda n} $$
This ensures that only high-confidence causal pathways are retained, enabling reliable mapping of how local signals propagate into national-level impact.
Generative AI Layer
Once the dataset is structured and filtered through deterministic processes, Generative AI is applied as a final interpretive layer. Instead of relying on LLMs for discovery, Narad uses them to translate structured data into human-readable outputs such as:
- Fact Sheets
- Bias and narrative analysis
- Predictive state-level briefings
This separation of responsibilities ensures both computational efficiency and factual consistency.
Challenges We Faced
High-Dimensional Memory Constraints
As the dataset scaled to tens of thousands of articles, performing real-time spatial searches over dense vector representations introduced significant memory bottlenecks. This led to instability during concurrent query execution.
Addressing this required deep optimization of the vector indexing and retrieval pipeline, ensuring that high-dimensional searches could be executed efficiently without compromising system reliability.
Generative AI Cost-to-Performance Trade-offs
Early experimentation with LLM-driven pattern discovery revealed critical limitations in both latency and cost. Passing large-scale datasets directly into generative models proved impractical for real-time systems.
We resolved this by enforcing a strict architectural boundary: deterministic systems handle search, filtering, and relationship mapping, while LLMs are limited to summarization and interpretation. This significantly improved performance while maintaining scalability.
Temporal Modeling and Long-Term Context
Initial temporal scoring mechanisms heavily prioritized recency, using decay functions such as:
$$ e^{-\Delta t / 72} $$
While effective for highlighting recent events, this approach failed to support long-term narrative tracking and accountability analysis.
To overcome this, we redesigned the temporal data model to support persistent cross-year connections, enabling the system to track narrative evolution over extended periods.
What We Learned
Mathematical Systems Provide Scalable Foundations
Deterministic approaches to clustering and relationship modeling proved significantly more scalable and reliable than relying on generative models for structural understanding. Mathematical rigor ensures consistency, reproducibility, and efficiency at scale.
Data Architecture Is Central to Intelligence Systems
The transition from prototype to production highlighted the importance of storage and retrieval design. Handling large-scale vector data in real time requires carefully engineered database systems, as in-memory approaches do not scale reliably in production environments.
Multilingual Understanding Requires Semantic Alignment
Rather than relying on translation pipelines, building shared semantic vector spaces across languages enabled Narad to capture intent directly. This approach proved essential for accurately representing the diversity of Bharat’s information landscape while minimizing loss of meaning.
Built With
- amazon-bedrock
- amplify
- aws-(app-runner
- faiss
- fastapi
- next.js
- postgresql-(pgvector)
- python
- rds
- s3)
- spacy
- sql
- tailwind-css
- typescript
Log in or sign up for Devpost to join the conversation.