STEMFlow

STEMFlow Screenshot 1

Inspiration

Students in STEM do not just struggle with solving problems, they struggle with submitting them. They work in PDFs or on paper, then waste time translating everything into LaTeX or code. For professors and researchers, it is the reverse: turning rough STEM work into clean, structured documents.

What it does

StemFlow is a STEM workspace that combines voice transcription, PDF context extraction, AI-powered reasoning, and LaTeX editing into one seamless flow. It acts as an agentic workspace engineered for how STEM work actually happens, reducing the friction between analytical thought and structured digital documentation.

How we built it

StemFlow features a Next.js (TypeScript) frontend coupled with a FastAPI backend. We utilized Google Gemini (Vertex AI) to power the core agentic reasoning loops. A multi-modal architecture was designed to parse text, images, and speech (Google Cloud Speech-to-Text). The backend leverages Python's SymPy library as an isolated deterministic solver, ensuring the AI can definitively evaluate complex mathematics before generating final LaTeX structures.

Challenges we ran into

Integrating multiple modalities presented several hurdles:

Structuring consistent and correct LaTeX formatting from unpredictable AI outputs.
Calibrating Vertex AI prompts to maintain structural integrity without hallucinating syntax.
Handling real-time voice transcription accurately in the context of advanced math terminology.
Building a robust, fully supported LaTeX editor capable of live differential updates.
Normalizing garbled text extraction from embedded PDF preview layers.

Accomplishments that we're proud of

StemFlow was fully conceptualized, engineered, and deployed in 24 hours during a hackathon at WashU. We successfully delivered a functional tool that tangibly solves a widespread academic pain point for students and researchers alike.

What we learned

We gained critical experience orchestrating complex AI agent pipelines and connecting deterministic logic (SymPy) with non-deterministic LLMs. On the infrastructure side, we learned the importance of tight resource management and having backup cloud credits, as scaling multimodal AI operations is highly resource-intensive.