Inspiration
- While everyone agrees that "Data is the New Oil," for the most part, we do not have the tools to get insights out of data; that is, when we try to get insight from our own collected data, we often find ourselves battling through various tools like spreadsheets full of formatting mishaps (e.g. commas, % symbols, and currency signs). And we've come to realize that data collection may be relatively simple for small business owners, students, and non-technical researchers, but the skill of gathering insights from those collected data continues to be a barrier to those individuals until they learn how to program in Python or SQL.
- Thus, the objective behind developing the AI Data Analyst Pro was to provide a bridge over this gap and create an experience that would be akin to conversing with your data. The launch of the Gemini 3 Pro has enabled us to provide a "Vibe Coding" experience where all the user has to do is verbally communicate with their data, and we'll then do all the necessary work to prepare that data for visualisation as rapidly as possible.
What it does
- AI Data Analyst Pro is an on-premise BI platform that acts as a bridge between you and the SAP for converting raw CSV files into business intelligence insight through intelligent ingestion & cleaning, auto-dashboarding, conversational agent, and deep analysis.
- Intelligent Ingestion & Cleaning: When a user first uploads a file with dirty data (ex: $6,80,985 or 45.6%) the application’s engine performs automatic parsing, cleaning and data typing (no manual formatting necessary).
- Auto Dashboarding: The system instantly creates a health check dashboard from uploaded csv file(s) with KPIs (total revenue, average growth) and the type of distribution charts specified by user.
- Conversational Data Agent: Users may use plain English to query or interact with data (ex: "list the top 5 tech companies in california" or "plot revenue vs employees"). The application understands user intent, filters the data based on user's query, and generates an appropriate interactive chart.
- Deep Analysis: The system uses the statistics engine in the application to compute the correlation coefficient using pearson method and the use of z-scores to detect anomalies/outliers.
- Professional Reporting: Our application generates a multi-page pdf report with an executive summary, visualizations created in the application, along with the key discovery/trend summary and the link to source documents for insights created in the application.
How we built it
- The creation of the application was guided by the use of "Vibe Coding" principles and all application logic and development were done within Google AI Studio with Gemini 3 Pro.
- Core Brain: All core "brain" functionality that converts natural language requests into computer-readable JSON, was generated with Gemini 3 Pro.
- Front-End: The User Interface front-end was created using the React (Vite) framework along with TailwindCSS to create a unique split-screen interface reminiscent of a "Cyberpunk" aesthetic.
- Data Visualization: Recharts was the library of choice for the display of data in a responsive interactive animated format.
- Data Processing: We utilized PapaParse to enable very fast CSV file parsing and developed proprietary algorithms, utilizing the guidance from Gemini, to deal with the many messy string-to-number conversions to parse and convert CSV data into JSON for charts.
- Export Engine: We used jsPDF and html2canvas to develop an Export Engine to enable the generation of a structured PDF exports of the DOM elements and data states.
Challenges we ran into
- The "Dirty Data" Nightmare: The test dataset of United States Companies and India and that we are working on have commas in the numbers that are stored in the database and are also mixed Currency Symbols as well as Percent Sign Symbols. When we first attempted to render charts from these datasets, we experienced frequent crashing and sorting issues that had to do with how we were storing the data (i.e., "900" being considered larger than "1,000" because of the way the strings were sorted). As such, we had to create a supply chain layer that was able to recursively traverse through and validate the data type for each column at the point of upload.
- Cursor Focus Lost: During our exploration of the React framework, we discovered a conundrum with React causing the input focus for chat to be lost with each keystroke being entered. This became a major issue for us when developing the chat application. To resolve this issue, we placed the input state component separate from the State of the heavy Dashboards.
Accomplishments that we're proud of
- Successful Realization of "Vibe Coding": Our ability to create an enterprise-class logic engine primarily through natural language input into the A.I. Studio demonstrates that the Gemini 3 Pro can function as a senior architect level.
- Enhanced Usability and Visual Collaboration Model: We replaced the vertical scroll UI with a professional feeling Split View Model (Chat on one side; Live Data on the other). This allows for a more natural interaction similar to many advanced development environments.
- Client First - Privacy by Design: Our client-side only processing means that this application is both fast and privacy focused. All user data resides in the user's browser session after the initial loading.
- Robust Data Compatibility: This application was designed not just for perfect demo files. It was developed to work seamlessly with all types of raw, unstructured, mixed record types.
What we learned
- Gemini's logic engine consists of various large language models (LLMs) that not only process textual data but can also translate a user's needs into structured language or commands via JSON formatting used in the design of user interfaces.
- The most critical aspect of time-to-insight is that a user is seeking results—not how the question was answered. By eliminating the time spent setting up a project on a local machine, we have learned just how visually one can support immediate decisions with data within a report.
- The power of modern-day JavaScript libraries is such that they have the potential to replicate many of the same features that currently exist in server-side programming frameworks, such as Pandas and other similar tools/systems. As the frequency of small to medium-sized datasets continues to grow, the need for large back-end server infrastructure to handle those types of datasets will continue to decrease.
What's next for AI Data Analyst Pro: The Conversational BI Engine
- Voice Integration: allowing users to literally "talk" to their data using the Web Speech API.
- Multi-File Support: Enabling users to upload two CSVs (e.g., "Sales" and "Customers") and asking the AI to join them based on common columns.
- Google Drive Integration: Directly pulling sheets and CSVs from a user's Google Drive.
- Smart Forecasting: Implementing a lightweight linear regression model (using TensorFlow.js) to allow users to ask "Predict next month's revenue."
Built With
- ai
- data
- gemini
- javascript
- jspdf
- papaparse
- pro
- react
- recharts
- studio
- tailwindcss
- vite
Log in or sign up for Devpost to join the conversation.