The Imagination Machine

Inspiration

Everyone has a story to tell, but not everyone has the tools to tell their stories. The ability to tell stories effectively not only highlights and enhances the richness of life experiences, it also connects people with each other across geographic and societal boundaries, people who would otherwise be strangers. Stories form the fabric of the human experience, serve as the foundation for communities, and fuel human connection. VR/AR technologies enable immersive storytelling experiences like never before, where we can create and share experiences that feel almost like reality.

Yet the technical and financial barrier to entry for creating XR experiences prevents most people from creating and expressing their ideas, putting the future of the metaverse and humanity at risk of being dictated by the corporate interests of a few large companies. The VR/AR space is characterized by rapid advancements in technology and fine lines between utopian and dystopian futures. Like all technology, XR is a double edged sword with massive potential to empower and connect people across the world and simultaneous potential to control, manipulate, and commodify human life. In order to build a more united and beautiful world, we must prioritize human expression and experiences within the development of XR.

The exigence of democratizing storytelling in XR inspired us to create The Imagination Machine, an AI storytelling partner that helps anyone create immersive stories.

What it does

The Imagination Machine allows anyone to create immersive stories from scratch. Any story you want to tell, whether it’s dramatic nonfiction about your life or far-fetched science fiction or abstract poetry, The Imagination Machine helps you create a meaningful immersive experience effortlessly.

Imagine you want to tell a story about the future of the metaverse. You enter the Imagination Machine and start your story by simply speaking… “The metaverse began in scattered waves, emerging from little nooks of the Internet, where explorers, pioneers, futurists were experimenting with virtual worlds.”

As you say this, a visual illustration of your story starts to form in front of you. Then, the AI continues your story… “It was a playground for the curious and the imaginative. But as the years went on, the metaverse became more than that. It became a place where people went to work, to learn, to play. It became a place where people socialized, where they formed communities and bonds. It became a place where people escaped the harsh realities of the real world.” (This is actually GPT3-generated.) And then you continue, and together, human and AI go back and forth in a collaborative dance of storytelling.

The magic of the Imagination Machine is its flexibility in application. Our current GPT3 prompts are designed for storytelling, but a very simple change in prompting can turn our tool into a general immersive ideation tool for any kind of creative work. For example, users can converse with GPT3 to brainstorm ideas on designing a product, and the ideas would synchronously materialize in the space around them.

How we built it

Our generative art implementation uses three different technologies, and then our VR application was built in Unity for the Oculus Quest.

The story telling AI takes input from the user, who speaks their sections of the stories verbally in the scene. This uses GPT3 which we've primed with some text in advance to generate a story. The response is then sent back to the user, all with a latency under 1 second!

Then GPT3 is also used to generate the image request, by using a few-shot language model above it with examples of the types of inputs which work with our text->image architecture.

One we have the generated image request, we send that to our generative image art model. This uses CLIP (Contrastive Language Image Pairings) to process the image request, and then generative an image which matches the request. Our server uses a Diffusion based model and generates several pictures as it progresses from its initial image to the target one. We then show this image in our unity scene, next to the text it corresponds to.

The Unity application is handling the networking calls to all of these AI models, and displaying both the text and image data, as well as generating user input text data from listening to speech.

What's next for The Imagination Machine

From this initial prototype, we will build out a full experience with integrated music, improved UI/UX, and sharing / collaboration functionality. Among other improvements, we will add generated music reflective of the story mood, use multiple screens to display generated illustrations in parallel, develop a seeding technique that uses generated images as initial images for the next section of the story to build continuity, allow users more control over generative parameters, and allow users to share and collaborate on their generative storytelling experiences.

Beyond UI/UX improvements, there are many exciting directions we'd like to explore with our prototype:

Hyper-personalized AI-Assisted AR Storytelling - AR verison of The Imagination Machine that uses real-life text, recordings, and photos to create personalized storylines
General Ideation Machine - more generalized ideation experience that allows users to brainstorm and visualize ideas in free-form with AI
Sensor-integrated Generative Experiences - generative ideation with EEG feedback to improve output quality iteratively and adapt generated content to user taste

If you have any ideas or would like to collaborate with us (on this or other projects), please reach out!

Built With

amazon-web-services
clip
gpt3
nltk
s3
unity
vast.ai
vm
vqgan

Submitted to

MIT Reality Hack 2022
- Winner Cultural and Artistic Expression in the Metaverse

Created by

Worked on the Flask Generative Image from Text Input API

https://github.com/mathyouf/flask-dl

Matthew Fisher
Generative AI & WebXR
Alice Cai
Writer, developer, researcher. Human Augmentation & Generative AI. Hmu at acai@college.harvard.edu if we have shared interests!
Jack Lewis