AB alchemy

The central command center tracking key metrics like success rate and uplift. Features a Quick Start to generate AI hypotheses or load data
Input goals and connect analytics to generate hypotheses. The AI creates testable growth strategies optimized for your metrics and type.
Define control and treatment variants, set statistical parameters, and launch simulations or live deployments to external platforms
View comprehensive dashboards featuring conversion rates, statistical significance, p-values, and AI-generated reports to drive decisions
Connect AB Alchemy to GA4, Mixpanel, Amplitude, and more via robust RESTful APIs for automated data fetching & direct experiment deployment
Analyze reseults with interactive charts and automated statistical deep dives. Get AI-generated executive summaries and clear report
Securely configure your Gemini API Key to reinitialize the AI agents and power your experimentation workflow with the latest LLM logic

Inspiration A/B testing is the gold standard for data-driven decision-making, yet the workflow remains fragmented and technically demanding. I saw product managers struggling to design statistically rigorous tests, engineers burdened with implementing tracking code, and analysts bottlenecked by manual reporting. I wanted to democratize experimentation. My inspiration came from a simple question: What if an AI agent could act as my dedicated Data Scientist, Product Strategist, and Engineer all at once? This vision drove me to build a system that doesn't just "run" tests but understands business goals and the mathematics behind them. What it does AB Alchemy is an autonomous agent that automates the entire A/B testing lifecycle. It acts as a comprehensive experimentation team in a box:

Strategist: It analyzes business goals to generate data-driven hypotheses.
Statistician: It designs rigorous experiments, calculating sample sizes and performing power analysis (1-\beta = 0.8, \alpha = 0.05) to ensure validity.
Analyst: It interprets complex results, translating raw data into actionable business insights in plain English.
Simulator: It allows users to "test their tests" by generating realistic synthetic user data before going live. How I built it I engineered AB Alchemy as a modular agentic system powered by Google's Gemini 3 pro.
The Brain: I utilized Gemini 3 pto for its exceptional speed and reasoning capabilities, assigning it distinct "personas" to handle different stages of the testing lifecycle.
The Engine: I built a robust simulation engine using Python and Pandas (data_simulator.py) that generates realistic user behavior, including seasonality and time-of-day patterns.
The Interface: I used Streamlit to create a clean, interactive dashboard that guides the user from ideation to analysis.
Visualization: I integrated Plotly to render interactive charts for conversion rates, confidence intervals, and funnel analysis. Challenges I ran into
Structured Output from LLMs: Getting the LLM to consistently return valid JSON for application logic while maintaining creativity for hypothesis generation was difficult. I solved this by implementing robust JSON cleaning and validation layers.
Statistical Rigor vs. Hallucination: LLMs excel at text but can struggle with precise calculations. I mitigated this by using the LLM to design the test parameters, but delegating the actual math (p-values, confidence intervals) to standard Python libraries like scipy and statsmodels.
Simulation Realism: Creating a dummy data generator that felt "real" was a complex task. I had to implement intricate logic for day-of-week trends and distinct user segments to ensure the analysis dashboard looked authentic. Accomplishments that I'm proud of I am particularly proud of the Simulation Engine. It doesn't just spit out random numbers; it models user behavior with seasonality and segment-specific conversion rates, making the "test drive" experience feel incredibly authentic. I'm also proud of the Latency Optimization—by leveraging Gemini 3 pro, I achieved a near-instant response time for hypothesis generation, making the tool feel like a real-time collaborator rather than a slow background process. What I learned
Agents need "Guardrails": I learned that giving an AI agent a specific persona (e.g., "You are a PhD Statistician") significantly improves the quality and specificity of its output compared to generic prompts.
The "Cold Start" Problem: Synthetic data is incredibly powerful for testing agentic workflows. By simulating data, I could iterate on my analysis prompts much faster than if I had waited for real-world traffic.
The Power of Speed: The low latency of the Flash model was critical. Users expect instant feedback, and optimizing for speed transformed the user experience. What's next for AB alchemy
Live API Integrations: I plan to connect the agent directly to Google Analytics 4 (GA4) and Mixpanel for real-time analysis of live data.
Bayesian Optimization: I want to implement multi-armed bandit algorithms for dynamic traffic allocation to maximize conversions during the test itself.
Visual Editor: My ultimate goal is to build a visual editor that allows the agent to generate and inject the HTML/CSS for test variants directly into the user's application.

Built With

amplitude
mixpanel
numpy
optimizely
pandas
plotly
python
scipy
statesmodels
statsmodels-*-visualization:-plotly-*-apis-&-integrations:-google-analytics-4
streamlit

Updates

MAYANK deep started this project — Feb 09, 2026 09:22 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.