About Project
Inspiration
PictureTale was created to explore how images can be transformed into short stories. The project uses the Japanese narrative style ki-sho-ten-ketsu (introduction, development, twist, conclusion) as a guiding structure, which feels well suited for picture book–like storytelling.
What I Learned
- Building a small AI-driven web application using FastAPI and Svelte.
- Designing an iterative story workflow, where drafts are critiqued and refined.
- Using different AI models for different roles: Claude-3-Haiku for image captioning, and gpt-oss-120b for generation, critique, and revision.
- Incorporating traditional narrative techniques into AI outputs.
Language Support
The application supports both Japanese and English, automatically adapting based on the user's browser language settings.
How It Works
- User uploads images.
- claude-3-Haiku creates captions (Since gpt-oss doesn't not accept images).
- gpt-oss-120b generates a story draft.
- gpt-oss-120b performs self-critique and scoring.
- A revision step produces the final version in ki-sho-ten-ketsu style.
Formally:
$$ S_{n+1} = R(C(G(I))) $$
where (I) = captions, (G) = generation, (C) = critique, and (R) = revision.
Challenges
A central challenge was how to evaluate and improve the generated stories. To address this, the project introduced a self-critique mechanism where the model reviews its own drafts. A comparison UI allows side-by-side viewing of different story versions, making it easier to judge whether revisions are actually improvements.
Built With
- google-cloud
- python
- svelte
Log in or sign up for Devpost to join the conversation.