We realized that with the explosion of LLMs (Claude, GPT-4, Gemini) and the new Model Context Protocol (MCP), developers are paralyzed by choice. Which model is best for coding? Which is cheapest for summarization? We wanted to build a system that doesn't just guess, but knows. We were inspired to create a "meta-agent" that treats intelligence as a commodity, automatically routing tasks to the most efficient provider while seamlessly connecting to any external tool via MCP.
What it does Friday is a self-learning AI orchestration platform. It acts as an intelligent layer between your tasks and the raw AI models.
Smart Routing: It uses reinforcement learning (Thompson Sampling) to select the best LLM and MCP tool configuration for any given task based on past performance. Automated Evaluation: It runs tasks against "Golden Datasets" to benchmark quality, cost, and latency. Universal Connectivity: It integrates with the Model Context Protocol to access data from 20+ services (like GitHub, Google Drive, and Slack) without writing custom glue code. Real-Time Analytics: A live dashboard visualizes the agent's "thought process," costs, and success rates in real-time.
How we built it Backend: We built a high-performance API using FastAPI and Python. Database: We used Redis not just for caching, but as our primary data store for high-speed state management and tracking execution metrics. Integration: We implemented a custom MCP Client to dynamically discover and connect to MCP servers. Frontend: A reactive React dashboard that polls for live updates, allowing users to watch evaluations happen in real-time. AI Logic: We architected a modular provider system that abstracts away the differences between Anthropic, OpenAI, and Google APIs. Challenges we ran into The "Cold Start" Problem: Teaching the agent how to make good decisions before it had any data. We solved this by seeding it with simulated historical data. MCP Integration: As MCP is a very new standard, building a robust client that could handle diverse server capabilities was tricky. Evaluation Consistency: Defining what "success" looks like for an AI task is hard. We had to build a multi-faceted scoring system that considers correctness, confidence, and resource usage. Async Complexity: Orchestrating parallel batch evaluations across multiple models without blocking the API required careful async/await management in Python. Accomplishments that we're proud of Self-Optimization: Watching the system actually "learn" to pick cheaper models for simple tasks and powerful models for complex ones was a huge win. Golden Dataset Feature: We successfully built a feature where users can upload a JSON file of tasks, and the system automatically runs a massive batch evaluation to benchmark performance. Speed: The system is incredibly snappy, thanks to our Redis-first architecture. Extensibility: We can add a new AI model or tool in minutes, not days.
What we learned Evals are Everything: You can't improve what you don't measure. Building the evaluation engine first was the best decision we made. Redis is a Powerhouse: For AI agents that need to share state and metrics rapidly, Redis is far superior to traditional SQL databases. MCP is the Future: Standardizing how LLMs talk to tools simplifies the architecture immensely.
What's next for Friday Real-World MCP Servers: Connecting to live production APIs (Salesforce, Linear, Notion) instead of just mocks.
Log in or sign up for Devpost to join the conversation.