Inspiration
The inspiration behind OrgiTwin is to create a "digital twin" of a knowledge worker's processes. The project aims to build an AI-powered platform that observes, learns from, and augments digital workflows, freeing users from repetitive and mundane tasks. The goal is to let professionals focus on high-value, creative, and analytical work. The name itself suggests creating a digital counterpart ("twin") that understands and can replicate a user's or organization's ("Orgo") workflows, acting as an intelligent assistant that boosts productivity across both technical and creative domains.
What it does
OrgiTwin is an AI-powered digital assistant that automates user workflows through a two-phase process: "Learn" and "Act".
Learn Mode: The platform captures user screen interactions and workflows (using a service like Screenpipe, as suggested by the README). This raw data is submitted to a secure backend.
AI Analysis: A Supabase Edge Function processes the recording, using the Google Gemini API to analyze the sequence of actions. It identifies repetitive patterns, understands the workflow's intent, and automatically generates a series of automatable steps.
Act Mode: Users can trigger these learned automations from a dashboard. The platform uses a secure, sandboxed cloud environment (powered by E2B) where an AI agent (leveraging a powerful LLM like Claude or an OpenAI model) executes the steps, controlling a virtual desktop to complete the task.
Digital Assistant Interface: It features a conversational AI assistant with a video avatar (powered by Tavus) that can explain actions, accept natural language commands, and guide the user through the automation process.
Productivity Analytics: The application provides a comprehensive dashboard that visualizes the time saved, execution success rates, and productivity gains, complete with achievements and projected future savings.
How we built it
The project is built on a modern, serverless architecture combining a React-based frontend with a Supabase backend and several integrated AI services.
Frontend: The user interface is a single-page application built with React, Vite, and TypeScript. It's styled using Tailwind CSS and features a rich component library from shadcn/ui. State management for server data is handled efficiently by TanStack Query (React Query).
Backend: The entire backend is built on Supabase, utilizing its PostgreSQL Database for storing user data, recordings, and actions; Supabase Auth for user authentication (email/password and OAuth); and Supabase Edge Functions for serverless logic.
AI & Automation:
Workflow Analysis: The process-recording Edge Function uses the Google Gemini API to analyze raw interaction data and generate structured, automatable actions.
Task Execution: The execute-action Edge Function orchestrates the automation. It uses the E2B SDK to spin up a secure, sandboxed cloud environment where an AI agent can safely operate. An LLM (like Anthropic's Claude or an OpenAI model) acts as the agent, interpreting instructions and controlling the virtual desktop.
Digital Assistant: A lifelike digital assistant is powered by the Tavus API, which generates realistic video responses from text, creating a more engaging and personal user experience.
Architecture: The system follows an event-driven, asynchronous model. The frontend submits a recording, which triggers a chain of serverless functions that process, analyze, and store the workflow without blocking the user interface. The frontend polls for status updates to provide real-time feedback.
Challenges we ran into
Network Interference from Browser Extensions: A significant challenge was ensuring reliable communication with the Supabase backend. Many ad-blockers and privacy-focused browser extensions were interfering with API calls, sometimes by manipulating URLs or blocking requests. We overcame this by building a robust supabaseWrapper.ts with automatic retries, bypass mechanisms, and a comprehensive networkDiagnostics.ts utility to detect problematic extensions and guide the user.
Bridging Web UI with Desktop Automation: Creating a seamless experience where a user can trigger complex desktop actions from a web browser was difficult. We solved this by integrating the E2B SDK, which allowed us to create and control secure, remote sandboxed environments. The orgoService.ts and SurfIntegration.tsx components act as the bridge between our web app and this remote execution environment.
Managing Asynchronous AI Processes: The workflow—from submitting a recording to receiving the analyzed actions—is a long-running, multi-step process. We managed this by using background tasks in our Supabase Functions (EdgeRuntime.waitUntil) and implementing a polling mechanism (useRecordingStatus hook) on the frontend to provide users with real-time status updates without locking up the UI.
Ensuring Type Safety Across Services: Integrating multiple services (Supabase, Gemini, E2B, Tavus) while maintaining type safety was complex. We generated TypeScript types from our Supabase schema to ensure consistency and carefully crafted interfaces for the data flowing between the different AI APIs and our frontend.
Accomplishments that we're proud of
End-to-End Automation Pipeline: We successfully built a complete, working pipeline that takes raw user interaction data, uses AI to intelligently generate an automation plan, and executes that plan in a secure environment.
Human-like Digital Assistant: We are proud of creating an engaging digital assistant that uses Tavus to generate video responses, making the interaction feel more personal and futuristic than a standard text-based chatbot.
Secure and Resilient Architecture: The integration of E2B for sandboxed execution ensures user security. Furthermore, our extensive work on network diagnostics and retry logic makes the application robust and capable of functioning even in environments with interfering browser extensions.
Comprehensive Productivity Dashboard: We built a powerful analytics section (TimeSavedPage) that doesn't just show data but tells a story of the user's "optimization journey" with milestones, achievements, and projected savings. This provides tangible, motivating feedback on the value of automation.
What we learned
The Power of Composable AI: We learned that combining specialized AI services (Gemini for analysis, an LLM for agency, Tavus for video) creates a far more powerful and complete product than relying on a single model.
Serverless is Ideal for Event-Driven Workflows: Using Supabase Edge Functions was a perfect fit for our asynchronous processing pipeline. It allowed us to build a scalable and cost-effective backend without managing servers.
Defensive Frontend Development is Crucial: We learned firsthand that real-world browser environments are unpredictable. Building proactive diagnostic tools and resilient data-fetching logic is essential for creating a reliable user experience.
User Experience for AI is Key: It's not enough for the AI to be powerful; it must also be understandable and trustworthy. We learned the importance of features like "Checkpoints" and the Digital Assistant's explanations to keep the user in control and build confidence in the automation.
What's next for OrgiTwin
The README.md and current architecture point towards an exciting future for OrgiTwin, focused on expanding its capabilities, especially in creative domains.
Expand to Creative Workflows: Fully implement the planned integrations with Blender MCP and Ableton MCP to automate complex tasks in 3D modeling, animation, and music production.
Real-Time Co-Creation: Evolve the "Act Mode" into a synchronous collaboration tool where the AI agent can work alongside a user in real-time within an application, offering suggestions and executing commands on the fly.
Natural Language Task Composition: Enable users to describe high-level goals in natural language (e.g., "Create a summary presentation of last quarter's sales data"), which OrgiTwin will break down and execute across multiple applications.
Proactive Assistance and Suggestions: Enhance the AI to proactively identify optimization opportunities and offer assistance during a user's workflow, rather than only after a recording session is complete.
Open API and Third-Party Integrations: Develop an open agent protocol to allow other productivity and creative tools to integrate with OrgiTwin, making it a central hub for digital work automation.
Built With
- supabase
- typescript
Log in or sign up for Devpost to join the conversation.