-
-
The central command center tracking key metrics like success rate and uplift. Features a Quick Start to generate AI hypotheses or load data
-
Input goals and connect analytics to generate hypotheses. The AI creates testable growth strategies optimized for your metrics and type.
-
Define control and treatment variants, set statistical parameters, and launch simulations or live deployments to external platforms
-
View comprehensive dashboards featuring conversion rates, statistical significance, p-values, and AI-generated reports to drive decisions
-
Connect AB Alchemy to GA4, Mixpanel, Amplitude, and more via robust RESTful APIs for automated data fetching & direct experiment deployment
-
Analyze reseults with interactive charts and automated statistical deep dives. Get AI-generated executive summaries and clear report
-
Securely configure your Gemini API Key to reinitialize the AI agents and power your experimentation workflow with the latest LLM logic
Inspiration A/B testing is the gold standard for data-driven decision-making, yet the workflow remains fragmented and technically demanding. I saw product managers struggling to design statistically rigorous tests, engineers burdened with implementing tracking code, and analysts bottlenecked by manual reporting. I wanted to democratize experimentation. My inspiration came from a simple question: What if an AI agent could act as my dedicated Data Scientist, Product Strategist, and Engineer all at once? This vision drove me to build a system that doesn't just "run" tests but understands business goals and the mathematics behind them. What it does AB Alchemy is an autonomous agent that automates the entire A/B testing lifecycle. It acts as a comprehensive experimentation team in a box:
- Strategist: It analyzes business goals to generate data-driven hypotheses.
- Statistician: It designs rigorous experiments, calculating sample sizes and performing power analysis (1-\beta = 0.8, \alpha = 0.05) to ensure validity.
- Analyst: It interprets complex results, translating raw data into actionable business insights in plain English.
- Simulator: It allows users to "test their tests" by generating realistic synthetic user data before going live. How I built it I engineered AB Alchemy as a modular agentic system powered by Google's Gemini 3 pro.
- The Brain: I utilized Gemini 3 pto for its exceptional speed and reasoning capabilities, assigning it distinct "personas" to handle different stages of the testing lifecycle.
- The Engine: I built a robust simulation engine using Python and Pandas (data_simulator.py) that generates realistic user behavior, including seasonality and time-of-day patterns.
- The Interface: I used Streamlit to create a clean, interactive dashboard that guides the user from ideation to analysis.
- Visualization: I integrated Plotly to render interactive charts for conversion rates, confidence intervals, and funnel analysis. Challenges I ran into
- Structured Output from LLMs: Getting the LLM to consistently return valid JSON for application logic while maintaining creativity for hypothesis generation was difficult. I solved this by implementing robust JSON cleaning and validation layers.
- Statistical Rigor vs. Hallucination: LLMs excel at text but can struggle with precise calculations. I mitigated this by using the LLM to design the test parameters, but delegating the actual math (p-values, confidence intervals) to standard Python libraries like scipy and statsmodels.
- Simulation Realism: Creating a dummy data generator that felt "real" was a complex task. I had to implement intricate logic for day-of-week trends and distinct user segments to ensure the analysis dashboard looked authentic. Accomplishments that I'm proud of I am particularly proud of the Simulation Engine. It doesn't just spit out random numbers; it models user behavior with seasonality and segment-specific conversion rates, making the "test drive" experience feel incredibly authentic. I'm also proud of the Latency Optimization—by leveraging Gemini 3 pro, I achieved a near-instant response time for hypothesis generation, making the tool feel like a real-time collaborator rather than a slow background process. What I learned
- Agents need "Guardrails": I learned that giving an AI agent a specific persona (e.g., "You are a PhD Statistician") significantly improves the quality and specificity of its output compared to generic prompts.
- The "Cold Start" Problem: Synthetic data is incredibly powerful for testing agentic workflows. By simulating data, I could iterate on my analysis prompts much faster than if I had waited for real-world traffic.
- The Power of Speed: The low latency of the Flash model was critical. Users expect instant feedback, and optimizing for speed transformed the user experience. What's next for AB alchemy
- Live API Integrations: I plan to connect the agent directly to Google Analytics 4 (GA4) and Mixpanel for real-time analysis of live data.
- Bayesian Optimization: I want to implement multi-armed bandit algorithms for dynamic traffic allocation to maximize conversions during the test itself.
- Visual Editor: My ultimate goal is to build a visual editor that allows the agent to generate and inject the HTML/CSS for test variants directly into the user's application.


Log in or sign up for Devpost to join the conversation.