Reading can be a great form of meditation to help one relax after a long day of work, school, or other activities that take up the day. In order to enhance peace of mind and calm that one experiences by reading, we are creating a XR application that brings stories to life. Our app will allow users to have three-dimensional visuals of what is happening in their book, which will enable them to see their favorite characters and feel immersed in their book's environment.
InnerScape is an XR application intended to promote calmness and peace of mind through advancements in literature. Targeting young adults who already own XR technologies (AR and VR), this program is an immersive experience, using AI to generate encompassing 3D worlds that change and adapt to a selected audiobook. Young adults constitute the majority of current audiobook listeners, as well as owners of XR technologies, making them the perfect user base. Using state-of-the-art models, along with stable databases and user research, we aim to change the future of technology in literature, bringing calmness as we transport you to a multiplex of worlds.
Tech Stack Data Storage An essential component of creating a book and audio database is navigating copyrights. To minimize expenses, we will license only the essential rights rather than e-book reading/display rights, facilitating legal text-to-audio conversion and AI-generated 3D scenes without paying for unnecessary privileges. The essential rights include audio production, audio streaming and distribution, derivative rights for visualizations, and limited text ingestion rights for technical processing. To power the backend infrastructure of the 3D audiobook platform, we will use Cloud tools that are designed for scalability, reliability, and cost-efficiency. AWS S3 serves as our cloud storage solution, providing a secure and globally distributed system for hosting audio chapters, cover art, rendered 3D environments, and HDRI files. Its ability to automatically scale with user demand ensures seamless performance even as the library of immersive content grows. S3 also provides secure streaming URLs, allowing both image and audio assets, such as /audiobooks/hanselgretel/chapter3.mp3 to be delivered efficiently to users across the world.
All user-related data is managed by Supabase (PostgreSQL), which combines a powerful SQL database with built-in authentication and real-time capabilities. This allows us to securely store user accounts, listening history, licensing information, and book metadata in one unified system. For example, it tracks exactly which book a user has access to and the precise point where they stopped listening, enabling a personalized experience across devices. Application logic is executed via AWS Lambda, a serverless compute platform that runs backend functions only when needed, such as generating a signed streaming URL or handling a login event. This eliminates the cost of always-running servers while automatically scaling in response to user traffic. Content is then delivered through Cloudflare CDN, a global edge network that caches audio and visual assets closer to users geographically, drastically reducing load time and bandwidth costs while improving streaming performance. To support monetization, the platform integrates the Stripe Billing API for global subscription and payment management. Stripe handles recurring billing, tax compliance, invoicing, and currency conversion, while also integrating directly with Supabase to automatically adjust user access based on payment status. This ensures a secure, seamless subscription experience across regions and devices. (See Figure 8)
HDRI Image Generation The InnerScape HDRI Image Generation Pipeline starts with a large language model (LLM), like GPT-5, which analyzes each uploaded story to detect shifts in mood, setting, and plot. It segments the text into scenes and generates concise visual prompts describing lighting, emotion, and environment. A secondary AI evaluates these prompts for semantic and emotional accuracy, regenerating any that fail, then stores approved prompts in structured JSON files with scene metadata and style attributes. These prompts are sent to SkyBox AI, which renders realistic 8K HDRI images with natural lighting and spatial depth, automatically generating metadata such as size, timestamp, and rendering parameters; failed renders trigger retries and logging. Each HDRI is validated by a pre-trained Vision Transformer (ViT) to ensure HDR format, lighting realism, and visual quality, and neighboring scenes are compared to maintain smooth transitions. Final HDRIs are stored in a cloud database indexed by story and scene. The InnerScape app dynamically loads matching environments in real time, fading between scenes for an immersive experience. Future versions will allow re-rendering in alternative art styles or adjusting tone and lighting, with user feedback continually refining the LLM’s prompt generation for improved visual accuracy. Audio Generation Our Audio Generation Pipeline will utilize commercially available neural network–based text-to-speech (TTS) systems, such as those offered by ElevenLabs, to generate high-quality narration and character dialogue. The pipeline begins by using an LLM to analyze and segment the story content into narrative scenes. Then, each segment is analyzed for tone, pacing, and emotional intensity. The system will select a pre-made voice profile for the narrator, ensuring consistent narration across scenes, generate suitable voice profiles for each character, and annotate the story with tags indicating emotion and tone in specific places. Next, the LLM will analyze the tone and content of each scene to select soundscapes and dynamic background scores that mirror the story’s rhythm without overpowering narration, based on a library of royalty-free or appropriately licensed audio files. Our pipeline will automatically layer and encode each relevant track into a final audio file, encoded in lossless formats (such as FLAC or high-bitrate AAC) for adaptive streaming, and make appropriate adjustments for spatial audio and interactive playback systems within the InnerScape platform.
One challenge we ran into was copyright license issues, as we need to start out by choosing books to transform in XR that are in the public domain. Fortunately, many classic books are in the public domain, which can be used in InnerScape as we start out. Moving forward, we hope to work with a large company, such as Audible, to gain permission to use copyrighted books in our app.
We’re proud to have developed a truly unique solution in the emotional and mental well-being space. By combining features from multiple industry-leading products, we created an app designed to help users relax and escape the stresses of daily life. In addition, as first-time WashU hackathon participants we are extremely proud that we were able to develop a completely new idea into a prototype in such a short amount of time.
We have done immense research into many different assets of application prototype development. Delving into different modern softwares such as Ellevenlabs and an assortment of HDRI generation technology, along with 3D software limits, we had to adapt to the project and learn new skills. As an all-McKelvey team, we had to learn about different marketing and business strategies, discussing with peers from different concentrations and spending hours researching through online sources.
Phase 1: Complete the minimum viable product that is needed to run InnerScape and deploy locally. We will target the WashU community using a small library of public domain books in order to collect user data and experiences. Pricing will be per book rather than a subscription fee. Current estimations propose a $3.99 fee per book experience.
Phase 2: Once enough user feedback and alterations have been made, we will expand our library to include as many books in the public domain as possible. We will launch our app on the most popular XR devices (including Meta Headsets, Apple Vision Pros, XReal Technologies, and others) to ensure we are not limited by hardware and work to garner a large customer base. We plan to add a subscription platform to this phase in order to help maintain a consistent user base. Additionally, at this phase, we will increase to $5.99 per book experience, for non-subscription users.
Phase 3: After obtaining a consistent user base, we plan to work with a company like Audible in order to gain access to most books without working through copyright issues. Pricing will be the price to obtain the book from this source, plus a small upcharge for project maintenance and future development.
Built With
- amazon-web-services
- blender
- cloudflare
- ellevenlabs
- skybox
- supabase
Log in or sign up for Devpost to join the conversation.