Project Story
Inspiration
This project was born at the intersection of education, family, and technology. One of our co-creators is a Portuguese and English teacher, constantly observing how many students struggle with writing assessments and have limited access to qualified feedback. At the same time, in our own home, we see our teenage daughter frequently relying on that same expertise to review her handwritten texts and confirm whether she is on the right track. This contrast, between students who lack support and a student who depends on continuous, expert guidance, highlighted a clear gap. From there came the impulse to design a solution that offers accessible, high-quality writing evaluation to learners who lack a teacher within reach.
At the same time, we observed that many learners struggle to use Large Language Models effectively. Crafting good prompts, understanding model constraints, and extracting actionable feedback are non-trivial skills. Rather than expecting every student to become an AI specialist, we set out to build a solution that abstracts this complexity.
By leveraging Google’s AI ecosystem, our project aims to transform handwritten text into structured, meaningful feedback clear, objective, and pedagogically grounded. In a world where interacting with AI systems will increasingly require clarity, argumentation, and critical thinking, tools that help students write better are not optional; they are infrastructure.
What We Learned
We started with a single prompt in Google AI Studio. That first experiment was decisive: in one iteration, we could see both the potential of the idea and its technical feasibility in terms of performance and cost.
From there, the learning curve accelerated. Although our team has broad experience with software development and AI, every project has its own ecosystem of nuances. In this case, we:
- Validated end-to-end feasibility early using AI Studio.
- Explored new tools such as Vite, which was introduced through the generated project. Working with Vite for the build process proved to be highly efficient and productive.
- Refined our understanding of how to orchestrate multiple Google tools cohesively from prototyping to deployment.
This journey reinforced a core insight: high-quality AI tooling not only powers the solution but also accelerates innovation itself.
How We Built the Project
The implementation followed a fast, iterative, and production-focused path:
Proof of Concept in Google AI Studio We began with a prompt-generated, minimalistic proof of concept. This initial version performed basic text analysis, helping us validate the concept, identify constraints, and understand the primary technical challenges.
Deployment to Cloud Run After validating the POC in AI Studio, we deployed it to Cloud Run to ensure scalability, reliability, and alignment with real-world usage scenarios.
Front-End Refinement with Gemini CLI We downloaded the generated source code and used it as the foundation for the visual interface. Using the Gemini CLI (with community prompts from GitHub), we refactored the project into a dedicated front-end application, decoupling it from image processing and direct LLM communication. This separation of concerns improved maintainability and flexibility.
Backend with Google Agent Development Kit (ADK) In parallel, we used the Google Agent Development Kit with the local Web UI to design and validate the agent workflow:
- Experimenting with different prompts and models.
- Aligning outputs with our pedagogical and linguistic criteria.
- Ensuring robustness in handling diverse handwriting inputs and languages.
- FastAPI-Based Orchestration Layer After stabilizing the agent behavior, we used the Gemini CLI to generate and adapt a FastAPI backend. This service:
- Orchestrates the analysis workflow.
- Manages communication with the agent.
- Streams real-time processing status and final feedback to the front-end in a clear, user-friendly way.
The result is an architecture that is modular, scalable, and ready to evolve beyond the scope of the challenge.
Challenges We Faced
Building this solution surfaced meaningful technical and conceptual challenges:
1. Reliable Text Extraction from Images Extracting handwritten text consistently is inherently a complex task. Selecting a model that was:
- fast enough for an interactive experience,
- cost-efficient for real-world scale, and
- accurate for diverse handwriting styles,
required extensive experimentation.
For this challenge context, Gemini 2.5 Flash delivered excellent performance in our tests, providing both speed and quality for handwritten text extraction. For large-scale, production-grade deployments, we recognize that a dedicated OCR/handwriting solution could complement or further specialize this step.
2. Defining Fair and Useful Evaluation Criteria Evaluating text is not just a technical task; it is also philosophical and pedagogical. Writing quality involves nuance and subjectivity.
To avoid superficial scoring, we grounded our approach in:
- Linguistic aspects (clarity, cohesion, structure, syntax),
- Argumentative aspects (thesis, development, consistency, conclusion).
Our initial framework was inspired by the Brazilian ENEM examination, taken annually by over 5 million students. However, we deliberately structured the criteria to be universal, making the system relevant for:
- school essays in different countries,
- entrance exams,
- language proficiency tests such as TOEFL and IELTS.
The system:
- Automatically detects the language of the text using LLM-based analysis.
- Provides feedback in that same language.
- Adapts its evaluation logic to remain coherent and useful across different educational contexts.
3. Balancing Flexibility and Accessibility A key challenge was ensuring that students do not need “prompt engineering skills” to benefit from advanced AI. All complexity—model choice, prompt design, workflow orchestration—is encapsulated behind an interface designed for learners, not technicians.
Log in or sign up for Devpost to join the conversation.