Bad Scientist: Your AI-Powered Data Science Accelerator

Have impactful AI/ML business ideas? Show them—don't just tell them!

Inspiration

As a Data Scientist and hackathon player, I often spend more time learning new tech stacks and boilerplate code than innovating. Project managers, business owners, and innovators are experiencing the same. What if we could build PoCs that are relevant and aware of our internal context and knowledge by just describing ideas in plain text, and instantly get a working Streamlit app for pitching? This is how Bad Scientist was born.

What it does

Bad Scientist transforms text descriptions into working Streamlit apps. It features a smart multi-agent RAG system that ensures generated apps are functional and aligned with the latest Streamlit implementations, AI/ML models, tools, and technical documentation. The system seamlessly integrates ideas with internal organizational context and use cases, ensuring secure and relevant application development with the help of Snowflake Cortex Search and Mistral AI (with Snowflake Cortex LLM Functions).

Key Features:

  • Text-to-SQL: Automatically converts natural language to optimized SQL queries for your Snowflake tables
  • Text-to-Visualization: Creates interactive dashboards and data visualizations
  • Text-to-ML: Generates scikit-learn code for common machine learning tasks
  • Context-Aware: Understands your company's data schema through PRD/RFC uploads
  • Interactive Development: Built-in code editor for customizing generated applications

Requirement Agent

  • Leverages Snowflake Cortex Search to understand and retrieve relevant organizational context
  • Indexes and searches through internal documentation, code repositories, and data schemas
  • Ensures generated apps align with existing business processes and technical standards

Data Analyst Agent

  • Connect to Snowflake Warehouse and Database to understand and retrieve relevant organizational context
  • Searches through internal tables and data schemas
  • Ensures generated apps align with existing business processes and data availability

Technical Expert Agent

  • Specializes in Streamlit's and ML framework (currently scikit-learn only) latest features and best practices
  • Validates technical feasibility of proposed solutions

Code Generation Agent

  • Translates business requirements into working Streamlit code
  • Implements proper error handling and user input validation
  • Ensures code follows best practices and organizational standards

How we built it

Bad Scientist employs a sophisticated multi-agent architecture customed to Snowflake's environment to handle different aspects of app generation.

Multi-Agent Framework:

  • Custom multi-agent implementation to be able to use Snowflakes Cortex Search and LLM Functions like mistral-large2
  • Seamless integration with Snowflake data warehouses
  • Streamlit for interactive web applications
  • Automated ML pipeline for scikit-learn integration

Challenges we ran into

  • Optimizing SQL query generation for complex data relationships
  • Maintaining context awareness across different data science tasks
  • Balancing automation with flexibility for customization
  • Handling diverse schema variations and data types
  • Ensuring generated code follows best practices and security standards

Accomplishments that we're proud of

  • Developed a robust text-to-application pipeline that delivers working solutions
  • Successfully integrated with Snowflake's ecosystem
  • Created an intuitive interface that streamlines the data science workflow
  • Reduced MVP development time from days to minutes
  • Implemented secure handling of sensitive data and credentials

What we learned

  • The power of combining multiple AI agents for complex task automation
  • Best practices for enterprise data warehouse integration
  • Techniques for maintaining code quality in generated applications
  • The importance of context in automated code generation
  • Strategies for effective error handling in AI-generated code

What's next for Bad Scientist

  • Expand ML framework support beyond scikit-learn
  • Enhance visualization capabilities with more advanced chart types
  • Add support for automated testing and CI/CD integration
  • Implement collaborative features for team projects
  • Develop more sophisticated data transformation capabilities
  • Add support for more complex ETL processes

Built With

  • cortex
  • crewai
  • langchain
  • mistral
  • snowflake
  • streamlit
Share this project:

Updates