About Project

Inspiration

PictureTale was created to explore how images can be transformed into short stories. The project uses the Japanese narrative style ki-sho-ten-ketsu (introduction, development, twist, conclusion) as a guiding structure, which feels well suited for picture book–like storytelling.

What I Learned

  • Building a small AI-driven web application using FastAPI and Svelte.
  • Designing an iterative story workflow, where drafts are critiqued and refined.
  • Using different AI models for different roles: Claude-3-Haiku for image captioning, and gpt-oss-120b for generation, critique, and revision.
  • Incorporating traditional narrative techniques into AI outputs.

Language Support

The application supports both Japanese and English, automatically adapting based on the user's browser language settings.

How It Works

  1. User uploads images.
  2. claude-3-Haiku creates captions (Since gpt-oss doesn't not accept images).
  3. gpt-oss-120b generates a story draft.
  4. gpt-oss-120b performs self-critique and scoring.
  5. A revision step produces the final version in ki-sho-ten-ketsu style.

Formally:

$$ S_{n+1} = R(C(G(I))) $$

where (I) = captions, (G) = generation, (C) = critique, and (R) = revision.

Challenges

A central challenge was how to evaluate and improve the generated stories. To address this, the project introduced a self-critique mechanism where the model reviews its own drafts. A comparison UI allows side-by-side viewing of different story versions, making it easier to judge whether revisions are actually improvements.

Built With

Share this project:

Updates