InvestiGator - Autonomous AI Intelligence Platform

🎯 Inspiration

The inspiration for InvestiGator came from observing how investigators, journalists, and researchers spend countless hours manually connecting dots across disparate data sources. Traditional investigation tools are static - they show you what you ask for, but they don't think alongside you. We envisioned an AI agent that could autonomously explore information networks, discover hidden relationships, and surface insights you didn't even know to look for.

What if AI could be your tireless research partner, working 24/7 to map complex networks of entities and relationships?

That question drove us to build InvestiGator - a platform where Gemini AI doesn't just answer questions, it conducts entire investigations autonomously.


🏗️ What We Built

InvestiGator is a fully autonomous AI investigation platform powered by Google's Gemini API. Unlike traditional search or analysis tools, InvestiGator independently researches topics, discovers entities, maps relationships, and generates comprehensive intelligence reports - all without human intervention after the initial query.

Core Features:

1. Autonomous Investigation Engine

  • User submits a research query (e.g., "Analyze recent cybersecurity breaches")
  • Gemini plans a multi-step research strategy with hypothesis and subtasks
  • AI executes each research step, discovering entities and relationships
  • System builds a knowledge graph in real-time
  • Investigation runs until completion (or user pauses/redirects)

2. Real-Time Knowledge Graph Visualization

  • Interactive board showing entities (people, companies, locations, events, documents)
  • Relationship edges with confidence scores
  • Auto-layout using NetworkX (spring, grid, circular, hierarchical algorithms)
  • Color-coded by entity type, sized by importance
  • Live updates via WebSockets as AI discovers new information

3. AI Reasoning Transparency (Thought Chain)

  • Every investigation generates a "thought chain" - the AI's step-by-step reasoning
  • Shows confidence evolution: initial hypothesis → observations → conclusions
  • Displays why the AI made certain decisions
  • Full transparency into the autonomous process

4. Intelligent Report Generation

  • Executive summaries
  • Detailed analysis reports
  • Entity profiles
  • All generated in Markdown with evidence citations

🧠 How We Built It

Architecture Overview

Backend Stack:

  • Django REST Framework - API server with 31+ endpoints
  • Celery - Background task queue for autonomous operations
  • Redis - Message broker and WebSocket channel layer
  • PostgreSQL - Main database with JSONB for flexible metadata
  • Django Channels - WebSocket real-time updates
  • NetworkX - Graph layout algorithms
  • Google Gemini API - AI research engine

Frontend Stack:

  • Next.js 14 (App Router) - React framework
  • TypeScript - Type safety
  • Tailwind CSS - Styling
  • React Flow - Graph visualization
  • shadcn/ui - UI components
  • React Query - Data fetching and caching

Technical Deep Dive

1. Autonomous Agent Architecture

The investigation flow is orchestrated by Celery tasks:

@shared_task
def run_investigation(investigation_id):
    # Phase 1: Planning
    plan = gemini_client.plan_investigation(
        query=investigation.initial_query,
        focus_areas=["entities", "relationships"]
    )

    # Phase 2: Execution
    for subtask in plan.subtasks:
        result = gemini_client.execute_research_step(
            task_description=subtask.description,
            context=build_investigation_context()
        )

        # Save discovered entities, relationships, evidence
        process_research_results(result)

        # Generate thought for transparency
        thought = gemini_client.generate_thought(
            current_state=current_hypothesis,
            new_information=result
        )

        # Broadcast updates via WebSocket
        broadcast_entity_discovered(entity)
        broadcast_thought_update(thought)

2. Gemini API Integration

We use Gemini for 6 distinct AI operations:

  1. Investigation Planning - Generate research strategy and subtasks
  2. Research Execution - Find entities, relationships, evidence
  3. Entity Extraction - Identify people, companies, locations from text
  4. Relationship Analysis - Determine connections between entities
  5. Evidence Evaluation - Assess source credibility and relevance
  6. Thought Generation - Explain AI reasoning (transparency)

Each operation uses carefully crafted prompts with structured outputs:

def plan_investigation(self, query: str, focus_areas: List[str]) -> Dict:
    prompt = f"""
    You are an investigative AI. Plan a research strategy for:

    Query: {query}
    Focus Areas: {focus_areas}

    Generate:
    1. Hypothesis (what you expect to find)
    2. Research strategy (step-by-step approach)
    3. Subtasks (specific research actions)
    4. Expected entities (types to look for)

    Return as JSON: {{"hypothesis": "...", "subtasks": [...]}}
    """

    response = self.model.generate_content(prompt)
    return self._parse_json_response(response.text)

3. Real-Time Updates with WebSockets

Django Channels enables live board updates:

# Backend: Broadcast entity discovery
async def broadcast_entity_discovered(investigation_id, entity):
    channel_layer = get_channel_layer()
    await channel_layer.group_send(
        f"investigation_{investigation_id}",
        {
            "type": "entity_discovered",
            "entity_id": str(entity.id),
            "name": entity.name,
            "type": entity.entity_type,
            "confidence": entity.confidence
        }
    )

# Frontend: Receive updates
ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.type === 'entity_discovered') {
        // Add node to React Flow graph
        addNode(data.entity_id, data.name, data.type);
    }
}

4. Graph Layout with NetworkX

To prevent nodes stacking at (0,0), we use NetworkX's layout algorithms:

def _calculate_layout(self, entities, relationships, layout_type='spring'):
    G = nx.Graph()

    # Add nodes and edges
    for entity in entities:
        G.add_node(str(entity.id))
    for rel in relationships:
        G.add_edge(str(rel.source_entity.id), str(rel.target_entity.id))

    # Calculate positions
    if layout_type == 'spring':
        pos = nx.spring_layout(G, k=2, iterations=50, scale=1000)
    elif layout_type == 'grid':
        # Grid layout for large graphs
        pos = calculate_grid_positions(entities)

    return {entity_id: {'x': x, 'y': y} for entity_id, (x, y) in pos.items()}

5. Data Model Design

Our schema supports complex investigation workflows:

  • Investigation → Main entity (status, progress, confidence)
  • InvestigationPlan → AI-generated research strategy
  • SubTask → Individual research steps
  • Entity → Discovered people, companies, locations, etc.
  • Relationship → Connections between entities (typed, weighted)
  • Evidence → Supporting documents (web pages, PDFs, images)
  • ThoughtChain → AI reasoning transparency
  • Report → Generated intelligence reports

All connected with proper foreign keys and indexes for fast queries.


What We Learned

1. Prompt Engineering at Scale

Crafting prompts that return consistent, structured data across hundreds of API calls was harder than expected. We learned:

  • Explicit output format instructions - Always specify "Return as JSON with fields..."
  • Few-shot examples - Including examples in prompts improved accuracy by ~40%
  • Iterative refinement - Start broad, narrow based on actual results
  • Fallback strategies - Always have a plan B when AI returns unexpected format

2. Async Architecture Complexity

Coordinating Django, Celery, Redis, and WebSockets taught us:

  • Task orchestration - Breaking complex workflows into atomic tasks
  • Error propagation - How to handle failures gracefully across async boundaries
  • State management - Keeping investigation state consistent across systems
  • Broadcasting patterns - Efficient WebSocket updates without overwhelming clients

3. Graph Visualization Performance

Rendering large graphs (100+ nodes) in real-time requires:

  • Smart layout algorithms - NetworkX's spring layout works great up to ~200 nodes
  • Progressive loading - Load nodes first, calculate layout in background
  • Virtual viewport - React Flow's viewport for handling large graphs
  • Position caching - Save calculated positions to avoid re-computation

4. AI Reliability Considerations

Working with AI in production systems requires:

  • Confidence scoring - Every AI-generated fact has a confidence score
  • Evidence linking - All claims backed by source evidence
  • Human oversight - Pause/resume/redirect gives humans control
  • Thought transparency - Users can see AI's reasoning process

5. Real-Time UX Design

Building a real-time investigation dashboard taught us:

  • Optimistic updates - Update UI before server confirmation
  • Graceful degradation - WebSocket fails → fall back to polling
  • Progress indicators - Users need constant feedback during long operations
  • Interrupt mechanisms - Always allow users to pause/cancel

Challenges We Faced

Challenge 1: Celery Task Not Triggering

Problem: Investigations stayed "pending" forever. Celery worker was running, tasks were registered, but nothing happened.

Root Cause: The perform_create method in InvestigationViewSet had the Celery task call commented out as a TODO:

# TODO: Trigger Celery task to start investigation
# run_investigation.delay(investigation.id)

Solution: Uncommented the line. Sometimes the simplest bugs are the hardest to spot! 🤦

Lesson: Always verify integration points are actually connected, not just theoretically designed.


Challenge 2: Nodes Stacking at (0, 0)

Problem: All 64 entities appeared as a single dot. Edges were invisible because nodes had no distance between them.

Root Cause: Backend returned all positions as {"x": 0, "y": 0} because we hadn't implemented layout calculation.

Solution: Integrated NetworkX for automatic graph layout:

pos = nx.spring_layout(G, k=2, iterations=50, scale=1000, center=(500, 400))

Lesson: Graph visualization requires proper layout algorithms - random positioning doesn't cut it.


Challenge 3: NetworkX Requires NumPy

Problem: After adding NetworkX, API crashed with:

ModuleNotFoundError: No module named 'numpy'

Root Cause: NetworkX's layout algorithms depend on NumPy for matrix operations, but we only installed NetworkX.

Solution: Added both to requirements:

numpy==1.26.4
networkx==3.2.1

Lesson: Always check transitive dependencies, especially for scientific libraries.


Challenge 4: Gemini API Returns Empty Results

Problem: Investigations completed in 1 second with 0 entities, 0 relationships, 0 evidence.

Root Cause: GEMINI_API_KEY was empty string, causing all API calls to fail silently and use fallback plan.

Solution:

  1. Added API key to .env
  2. Improved error handling to surface API failures
  3. Added validation on startup

Lesson: Never fail silently on critical integrations. Log errors loudly.


Challenge 5: WebSocket Authentication

Problem: WebSocket connections failed with 403 Forbidden.

Root Cause: WebSockets can't send traditional Authorization: Bearer headers from browser JavaScript.

Solution: Created custom middleware to extract JWT from query parameter:

class JWTAuthMiddleware:
    async def __call__(self, scope, receive, send):
        token = parse_qs(scope["query_string"]).get(b"token", [None])[0]
        if token:
            scope["user"] = await get_user_from_token(token)

Lesson: WebSocket authentication requires different patterns than REST APIs.


Challenge 6: Race Conditions in Real-Time Updates

Problem: Frontend sometimes showed relationships before the source/target entities existed.

Root Cause: Async task execution + WebSocket broadcasting created race conditions.

Solution:

  1. Use Django transactions to ensure atomic saves
  2. Broadcast entity creation before relationships
  3. Frontend queues relationship updates until both nodes exist

Lesson: Distributed systems require careful ordering of events.


Challenge 7: Prompt Engineering for Consistency

Problem: Gemini sometimes returned unstructured text instead of JSON, breaking our parsers.

Root Cause: LLMs don't always follow instructions perfectly, especially with creative prompts.

Solution:

  1. Explicit format instructions in every prompt
  2. Regex to strip markdown code fences (```json)
  3. Try-catch with fallback parsing
  4. Temperature = 0 for structured outputs

Lesson: Always plan for AI outputs to be messier than expected.


What We're Proud Of

  1. True Autonomy - The AI genuinely researches independently, not just responding to queries
  2. Transparency - Thought chain shows every step of AI reasoning
  3. Real-Time Everything - Live updates make investigations feel dynamic
  4. Production-Ready Architecture - Proper async task queue, WebSockets, caching
  5. Beautiful UX - Dark mode, smooth animations, intuitive graph visualization
  6. Complete System - Backend + Frontend + Real-time + AI all working together

What's Next for InvestiGator

Immediate Roadmap:

  • Voice Integration - Use Gemini Live API for voice-guided investigations
  • Multi-Modal Evidence - Analyze images, videos, audio files
  • Collaborative Investigations - Multiple users working on same investigation
  • Advanced Analytics - Network analysis metrics (centrality, clustering)

Future Vision:

  • Investigation Templates - Pre-built strategies for common use cases
  • External Data Connectors - Direct integration with public records APIs
  • Machine Learning Insights - Pattern detection across investigations
  • Enterprise Features - Team workspaces, SSO, audit logs

Technical Metrics

  • Backend: 31+ REST API endpoints, 2 WebSocket consumers
  • Database: 9 models with 15+ relationships
  • AI Operations: 6 distinct Gemini API integrations
  • Real-Time: WebSocket broadcasts for 6 event types
  • Graph Algorithms: 5 layout options (spring, grid, circular, hierarchical, type-based)
  • Frontend: 8 pages, 40+ components
  • Code Quality: TypeScript for type safety, organized architecture

Acknowledgments

  • Google Gemini Team - For building an incredible AI API
  • Anthropic Claude - For development assistance and architecture review
  • NetworkX Community - For powerful graph algorithms
  • React Flow Team - For amazing graph visualization library

Conclusion

InvestiGator demonstrates the power of autonomous AI agents working alongside humans. By combining Gemini's reasoning capabilities with real-time visualization and transparent thought processes, we've created a tool that doesn't just help with research - it actively conducts it.

This project pushed us to solve real distributed systems challenges: async task orchestration, WebSocket state management, graph layout algorithms, and reliable AI integration. We're proud of what we built and excited about where it can go.

The future of investigation isn't better search - it's autonomous AI partners.

Built With

Share this project:

Updates