Ojo - Wikipedia for Everyone

Know how they reached, not just who they are.

Inspiration

Wikipedia is incredible but it mainly documents those already famous.
If you want to know how Sundar Pichai or Elon Musk got to where they are, you’ll find only a summary, not the full story.
And if you search for someone less known - a local entrepreneur, researcher, or even yourself -Wikipedia gives nothing.

That’s why we built Ojo, Wikipedia for everyone, where you can know how they reached, not just who they are.
Ojo makes verifiable life journeys accessible to anyone. It lets curious people, recruiters, mentors, researchers, and community members search any name and discover the complete, evidence-backed path of a person’s growth - from education to projects to turning points.

Our goal is to convert scattered public records into a single, traceable, auditable timeline that teaches and inspires.

What It Does

Ojo accepts a person’s name (via text or voice) and returns a ranked disambiguation list.
Once the user confirms a candidate, Ojo aggregates verified public signals from sources like LinkedIn, GitHub, YouTube, MediaWiki, and Google Search.

Each person’s journey is visualized as an image-first interactive timeline, with:

  • Exact quoted source snippets (max 25 words)
  • Source URLs and provenance logs
  • ISO date-normalized events
  • Verified summaries and hero images (license-aware)

Every node represents a milestone — an education, a job, a project, or an achievement - backed by real evidence.

Data Flow and Integrations

Input: User enters or speaks a name → /api/disambiguate
Ingestion: Fetches pages from public sources (MediaWiki, GitHub, LinkedIn, YouTube, Google) in parallel → Extract short quoted snippets → Store snippets and embeddings in TiDB Serverless
Search: Elasticsearch + TiDB vector search for hybrid relevance queries
Extraction: Snippet + URL pairs → Gemini AI Extractor → Generates normalized timeline events
Summarization: Gemini AI Summarizer → Produces verified bio
UI: Hero image + alternates → Interactive timeline → Export (PDF, Notion, Slack, embed)

Elasticsearch Integration Benefits

Elasticsearch powers search optimization and intelligent caching across Ojo’s infrastructure.

  • 15-day Expiry Caching: Stores previously searched profiles to avoid redundant API calls to LinkedIn, GitHub, or Google Search.
  • Millisecond Response Times: Cached queries return results instantly, improving user experience dramatically.
  • API Cost Efficiency: Reduces dependency on external requests, minimizing API rate-limit issues and costs.
  • Smart Search: Enables fuzzy matching, partial name searches, and filtering across people, timeline events, and professional details.
  • Scalable Layer: Acts as a distributed cache for Ojo’s multi-source ingestion pipeline, keeping TiDB and Gemini AI focused on fresh profiles only.

This makes Ojo not just faster but smarter, cheaper, and more scalable.

Gemini AI Integration Benefits

Gemini AI acts as Ojo’s intelligent reasoning engine, transforming fragmented online data into cohesive human stories.

  • Contextual Synthesis: Analyzes posts, profiles, news, and records to build structured, narrative-rich person timelines.
  • Entity Resolution: Detects duplicates, resolves conflicting information, and merges overlapping profiles with confidence scoring.
  • Timeline Construction: Converts unstructured snippets into date-normalized events with precise source provenance.
  • Insight Extraction: Identifies achievements, collaborations, and growth milestones beyond simple job titles.
  • Hallucination-Free Design: Feeds only verified, quoted snippets - ensuring all AI outputs are grounded in real evidence.

Gemini AI transforms Ojo from a data aggregator into a story builder - making every journey verifiable and human.

How We Built It

Tech Stack:
Frontend: Next.js (TypeScript), Tailwind CSS, Framer Motion
Backend: TiDB Serverless, Elasticsearch, Gemini AI
APIs: MediaWiki, Wikidata SPARQL, YouTube Data API, GitHub API, LinkedIn public data
AI Orchestration: Hosted LLM Extractor + Summarizer
Extras: Web Speech API for voice input, IndexedDB for local caching

Full Agentic Flow

  1. User inputs a name → disambiguation
  2. Ojo fetches verified sources → snippet extraction → embeddings
  3. Elasticsearch caches structured profiles
  4. Gemini AI extracts timeline events and synthesizes the journey
  5. TiDB stores verified data and provenance
  6. UI renders timeline, hero images, and export options

This pipeline connects multiple intelligent layers - data ingestion, semantic search, AI understanding, and transparent verification - into one cohesive experience.

Accomplishments

  • Built a production-grade multi-step AI agent connecting ingestion, indexing, search, and summarization.
  • Every displayed fact is source-backed and auditable.
  • Elasticsearch caching reduced profile load times from seconds to milliseconds.
  • Gemini AI enabled timeline extraction accuracy above 90%.
  • Fully functional image-first timeline UI with provenance logs, license metadata, and export features.

What We Learned

  • Feeding only exact snippets to AI is the best way to eliminate hallucination.
  • Hybrid Elasticsearch + TiDB search yields superior relevance and recall.
  • Transparent provenance builds trust faster than polished summaries.
  • Rate-limit-aware crawlers and 15-day caches maintain freshness and scalability.

What’s Next

  • Integrate Model Context Protocol (MCP) to enhance data context between extractor and summarizer.
  • Add queue-based serverless crawlers for incremental profile refreshes.
  • Enable community annotations and verified corrections.
  • Introduce Slack, Notion, and embeddable timeline widgets.

Built With

  • elasticsearch
  • framer-motion
  • gemini-ai
  • github-api
  • hosted-llm-apis
  • linkedin-public-data
  • mediawiki-api
  • next.js
  • tailwind-css
  • typescript
  • web-speech-api
  • wikidata-sparql
  • youtube-data-api
Share this project:

Updates