Inspiration

I'm a low-vision student. My co-founder Gabriel is totally blind.

When we got to campus, I wanted to study Geography. I was told no — not because I lacked the grades, but because the department couldn't make the visual content accessible. Charts. Maps. Diagrams. The institution could offer large print or Braille for exams, but the actual learning materials? Nobody had a plan for those.

That's the pattern across African universities. Visually impaired students get routed away from visual-heavy disciplines — Geography, Biology, Engineering, Architecture — because the content is either inaccessible or too expensive to adapt. The barrier isn't intelligence. It's format.

AuraLearn started as us refusing to accept that.

What It Does

AuraLearn converts visual academic content into structured audio that a screen reader can't produce on its own.

When a student uploads a lecture PDF, slide deck, or scanned document, the platform extracts every visual element — diagrams, charts, figures, maps — and runs each one through a vision-language pipeline. The output isn't a caption. It's a structured description that identifies what type of content it is, what it's communicating, how its components relate to each other, and what a student needs to understand it for their specific subject and level.

We call this AudioSpace: the spatial audio layer that separates primary content, labels, and annotations so students can navigate material rather than sit through a wall of narration. A bar chart sounds different from a flowchart. A labeled anatomy diagram is described differently from a geography map. The system adapts to the discipline.

Students access their converted materials through the AuraLearn dashboard at www.auralearn.co.ke. Lecturers can upload directly or link existing course materials. Assessment-linked content goes through a human review queue before reaching students — the AI drafts, a person confirms.

How We Built It

The core pipeline runs on GPT-4 Vision for visual interpretation, with a prompt architecture we built to extract structured meaning rather than surface description. We layered in academic context injection — the model is told the student's subject and level before it describes anything, so it calibrates terminology and depth appropriately.

Text extraction runs in parallel using PDF parsing libraries, then merges with the visual descriptions into a unified transcript. The audio output uses spatial formatting to distinguish content types, built on top of a text-to-speech layer we tuned for academic language density.

The front end is built with accessibility as the primary constraint, not an afterthought — every interaction works with a screen reader from day one.

Challenges We Ran Into

The hardest problem wasn't technical. It was that visually impaired students have spent years building workarounds — human describers, helpful classmates, recorded lectures — and a new tool has to be meaningfully better than those workarounds before anyone switches.

On the technical side: the model struggles with low-resolution scans and non-standard diagram styles. Early testing caught a scanned biology diagram where two labeled structures were transposed. The description was fluent, detailed, and wrong. A sighted student would catch it instantly. Our users wouldn't. That's what pushed us to build the confidence scoring layer and the human review queue for assessment-linked content.

We also built this while being the users. There's no separation between "does this work" and "does this work for us."

Accomplishments That We're Proud Of

Winning the AT4D Accessibility Hackathon told us the problem framing was right.

Getting accepted into the Innovate Now incubator gave us the infrastructure to build seriously.

The moment a student used AuraLearn to access a Geography diagram independently — in the discipline I was told I couldn't study — is the one that matters most.

What We Learned

Nothing for us, without us.

Every design decision that matters came from being visually impaired students building for visually impaired students. The tools that get built about this community without this community are the ones that make screen readers say "image" and call it done.

We also learned that the problem isn't the AI. The problem is who gets to define what "accessible" means. We're changing that definition from the inside.

What's Next for AuraLearn

Right now we're running on general-purpose vision-language models. They're good. They're not surgical.

A Geography diagram and a circuit schematic and an anatomical cross-section are not the same problem. They need models trained on domain-specific visual vocabularies — models that understand what matters in a cadastral map versus what matters in a cell division diagram.

That's where we're going. Securing funding to move from legacy general models to domain-specific computer vision pipelines, built discipline by discipline. Not a tool that gives a reasonable description of anything. A tool that gives the right description of the specific thing a student needs to learn today.

Built With

  • and-pitch-assets-google-gemini-/-vision-api-?-visual-interpretation-of-diagrams
  • and-scanned-academic-documents-google-text-to-speech-api-?-audio-output-generation-for-converted-content-supabase-?-backend-database-and-authentication-(postgresql-+-row-level-security)-javascript-/-react-?-front-end-interface
  • business-plan
  • charts
  • denovo
  • geminiapi
  • javascript
  • react
  • supabase
Share this project:

Updates