Inspiration

I love reading, and even in a digital age, I strongly prefer paper books. There is something pure about holding a physical book, turning pages, and writing notes by hand.

But while reading, I constantly find myself switching between the book and my phone. I look up words, lose track of highlighted passages, and struggle to revisit handwritten notes later. The joy of reading is often interrupted by the friction of managing information.

I soon realized this wasn’t just my personal pain point. Friends and family who enjoy paper reading shared the same frustration: paper books feel better to read, but make it hard to organize notes or ask questions in the moment.

This gap between pure paper reading and digital support inspired me to build Calie.


What it does

Calie is a reading pal designed for people who love paper books.

It helps readers identify what they are reading by scanning the book cover or ISBN, so all notes and conversations stay connected to the right book.

Calie offers two main ways to support reading:

  • A live audio and camera mode that allows readers to ask questions and think aloud while reading
  • An image-based note feature that turns handwritten circles and underlines into searchable digital notes

After each reading session, Calie creates a lightweight daily journal, capturing what was read and discussed, so readers can revisit their thoughts over time.


How we built it

Calie was built entirely by myself, from ideation and design to development, testing, filming, and editing.

Before building, I talked with friends and family to validate that this was a shared need, not just my own habit.

Technically, Calie combines book identification, real-time voice interaction, camera-based page understanding, and digital note generation. I built and iterated on the app using Google AI Studio (please see the last section for detailed descriptions of how Gemini APIs are integrated), eventually discovering a more effective workflow by developing directly on my phone and using voice instructions to refine the experience in real time.


Challenges we ran into

One of the biggest challenges was choosing the right models.

I experimented with Gemini live video models, image models, and voice models, and had to balance responsiveness, accuracy, and user experience.

Another challenge was interaction design. Initially, I planned to use hand gestures like double-taps or swipes on the page. However, current models were not reliable enough for precise gesture recognition.

I ultimately shifted to pen-based interactions — circles and underlines — which are natural reading behaviors and significantly more robust, while leaving gesture-based interaction as a future direction.

Developing and testing a camera-based mobile experience inside AI Studio also required multiple iterations before finding a smooth workflow.


Accomplishments that we're proud of

  • Building a complete, end-to-end product as a solo developer
  • Creating a hybrid live audio and camera reading experience that feels natural and non-intrusive
  • Successfully turning handwritten paper notes into structured digital memory
  • Designing an AI companion that supports reading without interrupting it
  • Maintaining a strong focus on human-centered, quiet interaction design

What we learned

Through this project, my understanding of AI and reading evolved.

I learned that AI doesn’t always need to provide faster answers or more information. Its value can also come from offering different perspectives and sparking new ideas.

Most importantly, I learned that restraint matters. An AI that knows when not to act can be just as powerful as one that is always active.


What's next for Calie, a reading pal

Looking ahead, I imagine Calie becoming a long-term reading companion.

Rather than living only on a phone, Calie could one day exist in more ambient forms, such as Google smart glasses, quietly present and hands-free.

The goal remains the same: to bridge the intimacy of paper reading with the memory and dialogue of the digital world, without asking readers to choose between the two.


How Gemini APIs are integrated

The app ("Calie") leverages the cutting-edge multimodal capabilities of the Gemini-3 and Gemini-2.5-flash-live. At the core of the "Page Snapping" feature is Gemini-3-flash. Its high-speed vision processing acts as the application's eyes, performing sophisticated OCR and spatial analysis to detect physical pen marks like circles and underlines. For the "Reading Pal" experience, the app integrates the Gemini-2.5-flash-live, which enables a low-latency, two-way voice conversation that processes synchronized audio and video frames. Calie doesn't just transcribe, she "sees" the page being discussed in real-time. The integration uses System Instructions to define Calie’s personality as a gentle, peer-like companion, while the Live API's native audio output ensures a natural, warm dialogue. Finally, Gemini-3-flash is used post-session to synthesize conversation history into "Reading Traces"—reflective summaries that populate the user’s long-term reading journal.

Built With

Share this project:

Updates