Inspiration Large CSV and XLSX files are hard to analyze without code. Most chat tools cannot take a 50โ€“100 MB dataset and produce a chart. SiftAI generates code for analysis and runs it locally on the full file.

What it does Drag and drop a file. SiftAI infers schema and basic quality stats. Ask a question in plain language. SiftAI generates an algorithm and executes it on the dataset. Returns tables and PNG or HTML charts.

How we built it FastAPI handles upload, schema inference, and column profiling. A Node CLI (chatbot.js) uses Gemini for planning and only receives the schema and the question. The model outputs Python that runs in a Pyodide sandbox with NumPy, Pandas, and Matplotlib. A Tailwind and Plotly frontend displays results and interactive visuals.

Challenges we ran into Ensuring that generated code runs in Pyodide without unsupported packages. Keeping UI responsive with large files and predictable memory use. Creating and cleaning per-upload workspaces.

Accomplishments End to end pipeline where the model plans code and execution occurs locally on the dataset. Fixed output artifacts: data_result.json, data_result.png, and optional data_result.html. Error handling that retries once or returns a short explanation.

What we learned Separate planning from execution to scale beyond model context limits. Enforce explicit backends and a fixed return schema for stability. Typical grouping and aggregation costs fit within ๐‘‚(๐‘›log๐‘›) O(nlogn).

Whatโ€™s next for SiftAI Chunked and streaming processing for 1,000,000+ rows, with optional GPU paths. Additional built-ins such as correlation matrices, anomaly scoring, and simple forecasting. One click export of the generated algorithm as a Python module or notebook.

Built With

Share this project:

Updates