Summr - AI-Powered Workflow Automation Platform

The Problem

In traditional enterprise environments, the journey from a business user identifying an automation need to a working solution is painfully slow and expensive:

  • Business User: "We need to monitor our Airflow DAGs and auto-restart failed ones."
  • Dev Team: "Sure, we'll add it to the sprint backlog. ETA: 2-3 weeks, $15K-30K in dev costs."
  • Business User: "But the DAG is failing right now..."

This cycle repeats across every operational task: patching servers, auditing GitHub repositories, monitoring databases, responding to incidents. Each automation requires:

  • Requirements gathering (days)
  • Development & testing (weeks)
  • Security review (days)
  • Deployment & training (days)
  • Total cost: $10K-50K per workflow
  • Total time: 2-6 weeks

The fundamental question: Why does it take weeks and tens of thousands of dollars for a developer to write what amounts to a Python script?


The Vision: Summr

Summr flips this model on its head. Business users describe what they want in plain English, and AI generates production-ready, secure, multi-step workflows in minutes—not weeks.

"GitHub repository health audit with email reports"Working workflow in 3 minutes


How We Built It

Architecture: Multi-Agent AI Team

Instead of a single AI trying to do everything, we built a 4-agent engineering team:

  1. Systems Architect (Designer Agent)

    • Analyzes user request
    • Designs multi-step workflow with dependencies, branches, error handling
    • Assigns risk scores and safety guardrails
    • Outputs: Workflow architecture blueprint
  2. Junior Developer Agent

    • Implements each step as isolated Python code
    • Handles service integrations (AWS, GitHub, SSH, Airflow, etc.)
    • Manages credentials securely via environment variables
    • Outputs: Production Python code for each step
  3. QA Engineer Agent

    • Generates comprehensive test code
    • Validates inputs, outputs, error handling
    • Checks for security issues (credential leaks, unsafe operations)
    • Outputs: Test suite with assertions
  4. Principal Engineer (Senior Agent)

    • Reviews all code from Developer and QA
    • Provides feedback on quality, security, best practices
    • Approves or requests improvements
    • Outputs: Go/No-go decision with detailed feedback

Iterative Refinement

The agents work in iterations (max 3 automatic, unlimited manual):

User Request → Designer creates plan → 
  For each step:
    Developer writes code → QA writes tests → Senior reviews both →
    If issues found: Developer improves based on feedback →
  End loop →
Final approval → Deploy workflow

Isolated Execution: Docker Sandboxes

Every workflow runs in an isolated Docker container with:

  • Fresh Python environment per execution
  • Automatic dependency installation (pip install)
  • Secure credential injection (no hardcoded secrets)
  • Complete isolation (no cross-contamination)
  • Full logs captured for debugging

AI Provider Flexibility

We support OpenAI and AWS Bedrock:

  • OpenAI: GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo
  • AWS Bedrock: Claude 4.5 Sonnet, Claude Haiku, Claude 3.x, Amazon Titan

Token Optimization: We reduced token usage by 61% (115K → 45K tokens per workflow) through:

  • Context summarization (Designer plan condensed for other agents)
  • Filtered feedback (only relevant step feedback in iterations)
  • Lazy-loaded policies (send names, not full rules)
  • Removed redundant service configs

What We Learned

1. Multi-Agent > Single Agent

Our first attempt used a single AI to generate entire workflows. Results: Inconsistent quality, poor error handling, security issues.

Solution: Specialized agents with clear responsibilities dramatically improved output quality. Just like real engineering teams, specialization works.

2. Context is Expensive

Sending the full Designer plan to every Developer call (5 steps × full plan = 5× duplication) wasted ~50K tokens per workflow.

Solution: Summarize the plan, send only current step details + dependencies. Savings: ~70K tokens/workflow.

3. Iteration Policy Matters

Early versions auto-iterated endlessly, burning through API credits.

Solution:

  • Automatic mode: Max 3 iterations
  • Manual mode: Unlimited iterations, user-triggered
  • QA auto-regeneration for critical issues (max 2 attempts)
  • Clear quality gates: Block finalization if critical issues remain

4. AI-Assisted Debugging is Essential

When workflows fail, users need help understanding why.

Solution: Built an AI Debugger Agent that:

  • Analyzes workflow code, errors, logs, and outputs
  • Provides interactive chat-based debugging
  • Suggests fixes with code snippets
  • Maintains debug session history

5. Docker Isolation is Non-Negotiable

Running arbitrary Python code on the host system? Recipe for disaster.

Solution: Every execution gets a fresh Docker container. Security, isolation, and reproducibility.


Challenges We Faced

Challenge 1: "Failed to create isolated Python environment"

Problem: Docker container creation failing on production deployments.

Diagnosis: Production environment lacked Docker daemon access or proper permissions.

Solution: (In progress) Verify Docker availability, implement fallback mechanisms, improve error messages.


Challenge 2: Branch Condition Type Mismatches

Problem: Workflow orchestrator used [] as default for all variable types when evaluating branch conditions.

Error: TypeError: '<' not supported between list and int

Example:

if retry_count < 3:  # retry_count defaulted to [] instead of 0

Solution: Implemented type-aware defaults:

switch (outputType) {
  case 'number': return 0;
  case 'string': return '';
  case 'boolean': return false;
  case 'array': return [];
  case 'object': return {};
  default: return null;
}

Challenge 3: GitHub Token Naming Mismatch

Problem: AI-generated code used personalAccessToken, but system injected credentials as github_token.

Error: NameError: name 'personalAccessToken' is not defined

Solution: Updated AI agent prompts to use correct variable names matching system injection schema.


Challenge 4: Database Import with Foreign Keys

Problem: Importing production data into dev database failed due to foreign key constraints.

Error: ERROR: update or delete on table violates foreign key constraint

Solution: Used TRUNCATE CASCADE to handle all foreign key dependencies atomically:

TRUNCATE TABLE debug_messages, debug_sessions, ... CASCADE RESTART IDENTITY;

Challenge 5: Token Costs at Scale

Problem: Each 5-step workflow consumed 115K tokens (~$2-3 per workflow generation).

Impact: 100 workflows/day = $200-300/day in API costs.

Solution: Aggressive optimization (summarization, filtering, deduplication) → 61% reduction.

New cost: ~$0.80-1.20 per workflow. $80-120/day for 100 workflows.


Key Technical Achievements

1. Service Integration Framework

Built a multi-service connector system supporting:

  • AWS (EC2, S3, RDS, Lambda, etc.)
  • Google Cloud Platform
  • Kubernetes (EKS, GKE)
  • GitHub
  • Apache Airflow (SSH, EKS, AWS MWAA deployments)
  • Slack
  • Generic SSH/API endpoints

Each service has secure credential management with encryption at rest.


2. Apache Airflow L1/L2 Auto-Remediation

Implemented Connector Interface Contract (CIC) for multi-environment Airflow support:

interface AirflowConnector {
  probe(): Promise<AirflowHealth>;
  diagnose(dagId: string): Promise<DiagnosisResult>;
  executeAction(action: RemediationAction): Promise<ActionResult>;
  verify(): Promise<VerificationResult>;
}

Supported connectors:

  • EKS Connector: Kubernetes-based Airflow
  • AWS MWAA Connector: Managed Airflow
  • SSH/VM Connector: Self-hosted Airflow

AI understands Airflow specifics and generates intelligent remediation scripts.


3. Production-Grade Security

  • Replit Auth (OIDC) with PostgreSQL-backed sessions
  • RBAC: Developer, Approver, Admin roles
  • Encrypted credentials (AES-256)
  • Audit logs for all operations
  • Approval workflows for high-risk operations
  • TTL-based operation expiration (auto-cleanup)
  • Secret redaction in API responses

4. Real-Time Workflow Monitoring

Built comprehensive monitoring with:

  • Real-time execution logs (WebSocket streaming)
  • Step-by-step execution tracking
  • Agent output inspection (Designer, Developer, QA, Senior)
  • Execution history with filtering
  • Debug session management

Impact & Results

Time Reduction

Traditional Approach Summr
2-6 weeks 3-5 minutes
Reduction: 99.9%

Cost Reduction

Traditional Approach Summr
$10K-50K per workflow $1-3 per workflow
Reduction: 99.99%

Business User Empowerment

  • No coding required
  • Natural language input
  • Instant preview of workflow
  • Self-service automation
  • Full audit trail and governance

Use Cases Delivered

1. GitHub Repository Health Audit

  • Scans all repos for best practices
  • Generates health scores (0-100)
  • Identifies stale branches, missing files
  • Sends HTML email reports

2. Airflow DAG Auto-Remediation

  • Triggers and monitors DAG executions
  • Auto-retries failed DAGs
  • Sends success/failure notifications
  • Full execution history

3. Linux Server Patching with Docker

  • SSH-based system updates
  • Docker container state management
  • Pre/post-patching validation
  • Automatic container restart
  • Email reports on success/failure

Future Vision

Summr is on a path to become the enterprise L1/L2 auto-remediation platform:

  1. Expanded Service Integrations: Datadog, PagerDuty, Jira, Terraform
  2. Incident Response Automation: Auto-triage, auto-remediate, auto-escalate
  3. Natural Language Playbooks: "When CPU > 80%, scale workers and notify team"
  4. Learning System: Improve workflows based on execution history
  5. Multi-Tenant SaaS: Org-level isolation, team collaboration

The Bottom Line

Summr proves that business users don't need to wait weeks or spend tens of thousands of dollars for automation. With AI-powered multi-agent workflow generation, what used to take a development team weeks now takes minutes.

The future of operations is conversational, self-service, and AI-native.


Tech Stack

  • Frontend: React, TypeScript, Vite, TailwindCSS, shadcn/ui
  • Backend: Node.js, Express.js, TypeScript
  • Database: PostgreSQL (Neon serverless)
  • ORM: Drizzle ORM
  • AI Providers: OpenAI (GPT-4o), AWS Bedrock (Claude 4.5 Sonnet)
  • Execution: Docker (isolated Python sandboxes)
  • Auth: Replit Auth (OIDC)
  • Email: AWS SES via SMTP

Built with ❤️ by the Summr team

Share this project:

Updates