About Statistics Decoded AI

Inspiration

This project was inspired by my experiences as a statistics student during my PhD taking advanced statistics classes. I found myself creating mapping models of how SPSS report table datafields mapped to the report style professors wanted - and with my maps I was able to deliver A-papers that teachers raved about and wanted me to teach other students to duplicate.

The problem: Other students without programming backgrounds had a hard time looking at statistical reports and thinking about how to "map" them into templatized academic reports like I was able to do. I've wanted to use AI to solve this for years.

Earlier attempts: NSF actually approved my pitch of this idea years ago, but I couldn't find competent enough programmers to help me write it for a full proposal. Now, with the growth of "vibe coding", I'm able to work towards making this a reality that can help people, even without a budget for an "AI researcher".

What it does

Statistics Decoded AI transforms statistical analysis by deploying 3 specialized GPT-OSS agents that guide users through standardized ANOVA workflows:

  • Research Consultant (GPT-OSS-120B): Interactive study design and methodology guidance
  • Statistical Processor (GPT-OSS-20B): Analysis recommendations and statistical interpretations
  • Report Coordinator (GPT-OSS-120B): APA 7th edition compliant report synthesis

The system generates professional APA-formatted statistical reports as Word documents, avoiding the human errors with math that are commonplace and easy to make, while ensuring consistent methodology across different researchers.

How we built it

Hardware Reality Check: Getting GPT-OSS-120B to work on my "consumer" hardware as a Python programmer (not AI researcher) was the biggest challenge. I tried running Ollama locally but it was so slow it hung completely. I tried Hugging Face but the setup was intensive. I tried several other platforms before discovering Replicate (thanks to Pieter Levels mentioning it in one of his videos) and being successful with it.

Tech Stack: Python, Streamlit, SciPy/StatsModels, GPT-OSS-120B/20B via Replicate API, python-docx for APA formatting, Google Cloud SQL

Key Innovation: Recreating my manual "mapping" process from SPSS outputs to academic reports, but now automated through AI agents that anyone can use regardless of programming background.

Challenges we ran into

  1. Hardware Limitations: Getting GPT-OSS-120B running on consumer hardware - local options were too slow, cloud setup was complex
  2. Platform Discovery: Testing multiple AI hosting platforms before finding Replicate that worked reliably
  3. APA Formatting Complexity: Implementing precise APA 7th edition standards for statistical tables and citations
  4. Agent Coordination: Making the "mapping" process work automatically across 3 different specialized agents
  5. Academic Writing Standards: Ensuring AI maintains statistical rigor while generating readable reports

Accomplishments that we're proud of

  • Democratized advanced statistical reporting - making my PhD-level "mapping" skills accessible to anyone
  • Solved the hardware problem for individual researchers without AI research budgets
  • End-to-end automation from SPSS-style statistical output to publication-ready academic reports
  • Error reduction: Eliminating common mathematical errors that plague manual report writing
  • First implementation of an idea I've been trying to build for years, now possible with modern AI tools

What we learned

  • "Vibe coding" works: You don't need to be an AI researcher to build meaningful AI applications
  • Platform choice matters: Replicate made GPT-OSS accessible where other platforms failed
  • Academic AI applications have huge potential for reducing human error in research
  • Standardized templates can encode expert knowledge (like my mapping techniques) for broader use
  • Consumer hardware constraints are real but can be overcome with the right cloud services

What's next for Statistics Decoded AI

Educational Focus: Help students and researchers complete work more efficiently and accurately, reducing the mathematical errors that are commonplace in manual statistical reporting

Advanced Statistical Methods: Expand beyond ANOVA to regression, time series, and multilevel modeling with the same automated mapping approach. Keep improving the chat interface and take what has been demonstrated here (as budget allows) to use the advanced oss-gpt-120b fine tuning abilities to create individual jedi level AI master agents who are specialists in thier particiluar statistical arena (anova, multiple regression, moderation, mediation, etc.) that can work together (building on the existing multi-agent mindset already built into oss-gpt) to have a multi-agent team of statistical AI agents.

Large-Scale Research: Templates for multi-company projects like vaccine efficacy trials - ensuring standardized, error-free reporting that eliminates human-introduced bias through consistent, AI-supported methodologies

Accessibility: Make advanced statistical analysis available to researchers without programming backgrounds or AI research budgets

Built With

Share this project:

Updates