Architecture Diagram

Summr - AI-Powered Workflow Automation Platform

The Problem

In traditional enterprise environments, the journey from a business user identifying an automation need to a working solution is painfully slow and expensive:

Business User: "We need to monitor our Airflow DAGs and auto-restart failed ones."
Dev Team: "Sure, we'll add it to the sprint backlog. ETA: 2-3 weeks, $15K-30K in dev costs."
Business User: "But the DAG is failing right now..."

This cycle repeats across every operational task: patching servers, auditing GitHub repositories, monitoring databases, responding to incidents. Each automation requires:

Requirements gathering (days)
Development & testing (weeks)
Security review (days)
Deployment & training (days)
Total cost: $10K-50K per workflow
Total time: 2-6 weeks

The fundamental question: Why does it take weeks and tens of thousands of dollars for a developer to write what amounts to a Python script?

The Vision: Summr

Summr flips this model on its head. Business users describe what they want in plain English, and AI generates production-ready, secure, multi-step workflows in minutes—not weeks.

"GitHub repository health audit with email reports" → Working workflow in 3 minutes

How We Built It

Architecture: Multi-Agent AI Team

Instead of a single AI trying to do everything, we built a 4-agent engineering team:

Systems Architect (Designer Agent)
- Analyzes user request
- Designs multi-step workflow with dependencies, branches, error handling
- Assigns risk scores and safety guardrails
- Outputs: Workflow architecture blueprint
Junior Developer Agent
- Implements each step as isolated Python code
- Handles service integrations (AWS, GitHub, SSH, Airflow, etc.)
- Manages credentials securely via environment variables
- Outputs: Production Python code for each step
QA Engineer Agent
- Generates comprehensive test code
- Validates inputs, outputs, error handling
- Checks for security issues (credential leaks, unsafe operations)
- Outputs: Test suite with assertions
Principal Engineer (Senior Agent)
- Reviews all code from Developer and QA
- Provides feedback on quality, security, best practices
- Approves or requests improvements
- Outputs: Go/No-go decision with detailed feedback

Iterative Refinement

The agents work in iterations (max 3 automatic, unlimited manual):

User Request → Designer creates plan → 
  For each step:
    Developer writes code → QA writes tests → Senior reviews both →
    If issues found: Developer improves based on feedback →
  End loop →
Final approval → Deploy workflow

Isolated Execution: Docker Sandboxes

Every workflow runs in an isolated Docker container with:

Fresh Python environment per execution
Automatic dependency installation (pip install)
Secure credential injection (no hardcoded secrets)
Complete isolation (no cross-contamination)
Full logs captured for debugging

AI Provider Flexibility

We support OpenAI and AWS Bedrock:

OpenAI: GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo
AWS Bedrock: Claude 4.5 Sonnet, Claude Haiku, Claude 3.x, Amazon Titan

Token Optimization: We reduced token usage by 61% (115K → 45K tokens per workflow) through:

Context summarization (Designer plan condensed for other agents)
Filtered feedback (only relevant step feedback in iterations)
Lazy-loaded policies (send names, not full rules)
Removed redundant service configs

What We Learned

1. Multi-Agent > Single Agent

Our first attempt used a single AI to generate entire workflows. Results: Inconsistent quality, poor error handling, security issues.

Solution: Specialized agents with clear responsibilities dramatically improved output quality. Just like real engineering teams, specialization works.

2. Context is Expensive

Sending the full Designer plan to every Developer call (5 steps × full plan = 5× duplication) wasted ~50K tokens per workflow.

Solution: Summarize the plan, send only current step details + dependencies. Savings: ~70K tokens/workflow.

3. Iteration Policy Matters

Early versions auto-iterated endlessly, burning through API credits.

Solution:

Automatic mode: Max 3 iterations
Manual mode: Unlimited iterations, user-triggered
QA auto-regeneration for critical issues (max 2 attempts)
Clear quality gates: Block finalization if critical issues remain

4. AI-Assisted Debugging is Essential

When workflows fail, users need help understanding why.

Solution: Built an AI Debugger Agent that:

Analyzes workflow code, errors, logs, and outputs
Provides interactive chat-based debugging
Suggests fixes with code snippets
Maintains debug session history

5. Docker Isolation is Non-Negotiable

Running arbitrary Python code on the host system? Recipe for disaster.

Solution: Every execution gets a fresh Docker container. Security, isolation, and reproducibility.

Challenges We Faced

Challenge 1: "Failed to create isolated Python environment"

Problem: Docker container creation failing on production deployments.

Diagnosis: Production environment lacked Docker daemon access or proper permissions.

Solution: (In progress) Verify Docker availability, implement fallback mechanisms, improve error messages.

Challenge 2: Branch Condition Type Mismatches

Problem: Workflow orchestrator used [] as default for all variable types when evaluating branch conditions.

Error: TypeError: '<' not supported between list and int

Example:

if retry_count < 3:  # retry_count defaulted to [] instead of 0

Solution: Implemented type-aware defaults:

switch (outputType) {
  case 'number': return 0;
  case 'string': return '';
  case 'boolean': return false;
  case 'array': return [];
  case 'object': return {};
  default: return null;
}

Challenge 3: GitHub Token Naming Mismatch

Problem: AI-generated code used personalAccessToken, but system injected credentials as github_token.

Error: NameError: name 'personalAccessToken' is not defined

Solution: Updated AI agent prompts to use correct variable names matching system injection schema.

Challenge 4: Database Import with Foreign Keys

Problem: Importing production data into dev database failed due to foreign key constraints.

Error: ERROR: update or delete on table violates foreign key constraint

Solution: Used TRUNCATE CASCADE to handle all foreign key dependencies atomically:

TRUNCATE TABLE debug_messages, debug_sessions, ... CASCADE RESTART IDENTITY;

Challenge 5: Token Costs at Scale

Problem: Each 5-step workflow consumed 115K tokens (~$2-3 per workflow generation).

Impact: 100 workflows/day = $200-300/day in API costs.

Solution: Aggressive optimization (summarization, filtering, deduplication) → 61% reduction.

New cost: ~$0.80-1.20 per workflow. $80-120/day for 100 workflows.

Key Technical Achievements

1. Service Integration Framework

Built a multi-service connector system supporting:

AWS (EC2, S3, RDS, Lambda, etc.)
Google Cloud Platform
Kubernetes (EKS, GKE)
GitHub
Apache Airflow (SSH, EKS, AWS MWAA deployments)
Slack
Generic SSH/API endpoints

Each service has secure credential management with encryption at rest.

2. Apache Airflow L1/L2 Auto-Remediation

Implemented Connector Interface Contract (CIC) for multi-environment Airflow support:

interface AirflowConnector {
  probe(): Promise<AirflowHealth>;
  diagnose(dagId: string): Promise<DiagnosisResult>;
  executeAction(action: RemediationAction): Promise<ActionResult>;
  verify(): Promise<VerificationResult>;
}

Supported connectors:

EKS Connector: Kubernetes-based Airflow
AWS MWAA Connector: Managed Airflow
SSH/VM Connector: Self-hosted Airflow

AI understands Airflow specifics and generates intelligent remediation scripts.

3. Production-Grade Security

Replit Auth (OIDC) with PostgreSQL-backed sessions
RBAC: Developer, Approver, Admin roles
Encrypted credentials (AES-256)
Audit logs for all operations
Approval workflows for high-risk operations
TTL-based operation expiration (auto-cleanup)
Secret redaction in API responses

4. Real-Time Workflow Monitoring

Built comprehensive monitoring with:

Real-time execution logs (WebSocket streaming)
Step-by-step execution tracking
Agent output inspection (Designer, Developer, QA, Senior)
Execution history with filtering
Debug session management

Impact & Results

Time Reduction

Traditional Approach	Summr
2-6 weeks	3-5 minutes
Reduction: 99.9%

Cost Reduction

Traditional Approach	Summr
$10K-50K per workflow	$1-3 per workflow
Reduction: 99.99%

Business User Empowerment

No coding required
Natural language input
Instant preview of workflow
Self-service automation
Full audit trail and governance

Use Cases Delivered

1. GitHub Repository Health Audit

Scans all repos for best practices
Generates health scores (0-100)
Identifies stale branches, missing files
Sends HTML email reports

2. Airflow DAG Auto-Remediation

Triggers and monitors DAG executions
Auto-retries failed DAGs
Sends success/failure notifications
Full execution history

3. Linux Server Patching with Docker

SSH-based system updates
Docker container state management
Pre/post-patching validation
Automatic container restart
Email reports on success/failure

Future Vision

Summr is on a path to become the enterprise L1/L2 auto-remediation platform:

Expanded Service Integrations: Datadog, PagerDuty, Jira, Terraform
Incident Response Automation: Auto-triage, auto-remediate, auto-escalate
Natural Language Playbooks: "When CPU > 80%, scale workers and notify team"
Learning System: Improve workflows based on execution history
Multi-Tenant SaaS: Org-level isolation, team collaboration

The Bottom Line

Summr proves that business users don't need to wait weeks or spend tens of thousands of dollars for automation. With AI-powered multi-agent workflow generation, what used to take a development team weeks now takes minutes.

The future of operations is conversational, self-service, and AI-native.