Inspiration

Analysts spend hours doing repetitive EDA and visualization steps before getting to real insights. I wanted an “agent-style” workflow where you ask a question once and the system autonomously plans, executes, and explains the analysis like a real data teammate.

What it does

  1. Upload any CSV
  2. Ask a question (e.g., “find trends, outliers, missing values, top categories”) 3.The agent: > creates a step-by-step analysis plan > generates Python analysis code > executes code on the dataset > produces charts and insights
  3. Includes Offline Demo Mode fallback so the app still works even if the model is rate-limited or unavailable.

How we built it

  1. Streamlit UI for uploading CSV + asking questions
  2. Agent layer (Gemini / offline) to generate structured output
  3. Executor layer to extract the Python code block, run it safely against df, and save charts to disk
  4. Automatic chart rendering in the UI

Challenges we ran into

  1. Model availability and quota/rate limits during development
  2. Making code execution safe and predictable (ensuring charts are generated consistently)
  3. Handling different dataset schemas (numeric vs categorical columns)

Accomplishments that we're proud of

  1. A working end-to-end autonomous agent loop: question → plan → code → execution → charts → insights.
  2. Offline fallback mode that guarantees reliability during demos
  3. Clear, readable results that non-technical users can understand

What we learned

  1. Autonomous agents require strong guardrails: Generating code is easy; executing it safely and predictably requires careful sandboxing, allow-listed builtins, and fallback logic.
  2. Reliability matters more than raw model power in real demos: API quota limits and model availability can break live systems, so building an offline fallback mode dramatically improves robustness and user trust.
  3. Execution is where insight is created: The real value comes not from text output, but from running the generated code, producing charts, and validating results on real data.
  4. Model outputs must be constrained to be useful: Explicitly requiring chart generation, output structure, and execution-friendly code significantly improves consistency.
  5. Agent systems are orchestration problems, not just prompts: Planning, code generation, execution, error handling, and visualization must work together as a pipeline.
  6. Simple UX makes complex systems approachable: A minimal Streamlit interface was enough to make an advanced agent workflow understandable to non-technical users

What's next for AutoAnalyst AI - Autonomous Data Analyst Agent

  1. Add multi-step tool use (schema detection, automatic column selection, prompt refinement)
  2. Add a “report export” button (PDF/Markdown)
  3. Add guardrails: allowlisted imports, timeouts, and sandbox hardening
  4. Add dataset profiling + smarter chart selection

Built With

  • analysis
  • and
  • blocks
  • charts
  • cleaning
  • code
  • computations
  • data
  • dependency
  • environments
  • executable
  • extracting
  • from
  • isolation
  • matplotlib
  • model
  • numerical
  • numpy
  • output
  • python-?-core-language-for-agent-logic-and-execution-streamlit-?-interactive-web-ui-for-uploading-datasets-and-visualizing-results-google-gemini-api-?-large-language-model-for-analysis-planning-and-code-generation-pandas-?-data-loading
  • re)
  • regex
  • statistical
  • venv)
  • virtual
  • visualizations
Share this project:

Updates