Salon

A player joins a salon of AI agents with their personalities
AI agents interacting with each other. The player can join in at anytime.
Classic view support
Multi-language support

Inspiration

It started with a question I couldn't stop asking the air, late at night: what would James Baldwin actually say about AI if he were alive to see it? I wanted to put him in a room with Vincent van Gogh, a working therapist, an engineer who builds the things people are afraid of, and a Stoic philosopher who would tell them all to calm down. I wanted to listen.

The closest tool we have for that is reading. The next closest is asking a chatbot to pretend, which always feels thin. I wanted something between a research paper, an oil painting, and a video game. A real evening where minds I admire could argue in front of me, and I could lean in close to one and ask a question, and the others could overhear and answer back.

What it does

Salon convenes 3 to 5 minds of your choosing (historical figures, fictional characters, archetypes, or people you knew) around a candle-lit virtual table to debate any question you bring. Each participant gets an oil-painting portrait generated to look like them, a distinct TTS voice chosen to fit their archetype, and a persona conditioned on real quotes pulled from across the web. The debate runs in real time with audio. You can interrupt by typing or speaking, and when it ends Salon writes the central tension and the central synthesis as paragraphs you can read.

There are two ways to experience a salon. The classic view shows the octagonal table with portraits around it and exchange cards sliding in as they speak. The Walk the Salon view is a top-down RPG room: a candle-lit ballroom you can walk around as a custom avatar, approach any participant, listen to their inner thoughts hovering above them, eavesdrop when two of them are whispering to each other near the fire. Click on the central candle and it speaks back, a cryptic line about what the room is becoming.

When the evening is done, you get a memorable-moments carousel, a relationship web that shows who came to feel what about whom, and a share URL anyone can replay. You can also watch the entire salon in Mandarin, French, or Spanish.

How I built it

I built Salon entirely inside MeDo through multi-turn chat. The very first prompt was a paragraph asking for a web app called Salon and what it should feel like. MeDo generated the database schema, the React frontend, every edge function, the orchestration loop, and the UI components.

The architecture grew in layers. First the basic plumbing: salons table, participants, exchanges, the orchestrator loop that picks a speaker every six seconds. Then plugin integration: ERNIE for persona generation and exchanges, AI Search and Google Scholar to ground each participant in real quotes, Kling Image Generation for portraits, LemonFox TTS for distinct voices, Whisper for speech-to-text on the mic button, Google Translation for the Mandarin, French, and Spanish UI. Then the things you actually see: the octagonal table, the parchment chronicle scroll on the right edge, the candelabra at the room's center.

The hardest layer was the agent architecture, modeled on Stanford's Generative Agents paper. Each NPC accumulates memories of what was said and how they felt about it. Every 8 exchanges or 90 seconds, a reflection cycle fires: each participant reads their twenty most-important memories and updates their view of the topic and of every other participant. The relationship valence drives their movement in the explore view, so an NPC who has been challenged will walk over to the challenger, and one who has grown alienated will quietly step away from the table.

I worked with MeDo the way you would work with a quick-witted but distracted collaborator. I learned to write prompts that named the invariant (what must always be true), the entity (what is being made), and the boundary (what must not change). The best results came from being specific to a fault: exact hex codes, exact font weights, exact word counts.

Challenges I ran into

The Image Generation pipeline failed three turns in a row. MeDo kept reporting success while shipping initial-circle avatars instead of real portraits. The fix was switching from the Lite version of Kling to the Omni endpoint and adding a manual admin route that lets me regenerate portraits one by one when something goes wrong.

Audio pacing collided with the exchange cadence on the first build. The orchestrator was firing the next speaker every six seconds regardless of whether the previous person had finished talking, so audio piled up and cards faded mid-sentence. I had to amend the cadence to wait for the current audio plus a one-second pause, with the six-second timer as a floor.

The chroma-key filter to drop the white background on the player sprite needed three separate turns to actually land. MeDo would acknowledge it, claim success, and ship the same opaque rectangle. In the end it required regenerating the source PNG with a pure black background and applying a CSS blend-mode on top, not a filter.

The biggest mid-project pivot was Baidu Qianfan authentication. MeDo's first ERNIE integration assumed BYO API keys via Qianfan, but real-name verification was unworkable from my region. I switched the LLM helper to use Baidu AI Studio's Access Token in a follow-up turn, and the ERNIE calls have run cleanly since.

Accomplishments that I'm proud of

Salon is a working consumer product of a research idea. Stanford's Generative Agents paper showed that small groups of LLM agents can form an emergent social world. I wanted to put that in a browser tab where five characters debate a real question and you can stand in the room with them. It works. After a few minutes of simulation, the NPCs are not in a chat log, they are at a table, with views of one another that shifted over time.

The Flame Speaks moment came together in a single turn. Click the candle in the room's center, and ERNIE reads the current state of the debate and whispers one cryptic line back. It feels like the room has been watching.

The whole frontend is generated by MeDo, but the integrations are non-trivial. Across the build I wired ERNIE, Gemini, Kling Image Generation, LemonFox TTS, Whisper, Google Translation, AI Search, Google Scholar, and Webpage Content Extract. The submission is multilingual end to end, including the chronicle entries and the synthesis paragraphs.

I am also proud that the demo salon is a real salon. The Watch a Demo Salon button on the landing page plays a fully realized session with five generated portraits, twelve real ERNIE exchanges, audio for each, and a tension and synthesis paragraph already written, all pre-seeded at deploy time. Judges do not have to wait.

What I learned

Working with MeDo taught me that natural language scales with precision, not length. Vague prompts produce defaults; specific prompts produce surprise. The best results came when I named the invariant, the entity, and the boundary in the same prompt.

I also learned that multi-agent systems are a visibility problem before they are an algorithm problem. The first version of Salon ran the orchestrator correctly but looked to a user like a single chatbot taking turns. The relationship web on the replay page and the candle's whisper exist because the underlying agents needed a surface that made them legible.

And I learned to ship the wedge. Halfway through the build I had a list of fifteen elaborate features I wanted, including time-of-evening mood mechanics, mid-salon participant invitation, and a guided demo URL. I cut several of them, twice. The version that shipped is smaller than my original sketch and stronger for it. The features that survived (memory streams, reflection cycles, relationship valence, the candle, the multilingual rendering) are the ones I would point to in a paper.

What's next for Salon

I want to push toward what AI Town and Project Sid have been doing: real persistence across sessions, so a salon you convene tonight remembers what happened last week. I want voice cloning on the participant side so people can speak in the actual voice of someone they loved. I want a third view where the conversation is rendered as a generated short film. And I want a public hall, a way for two strangers to share their salons and see how the same minds reasoned differently when given different questions.

Beyond that, I think there is a serious educational version of Salon for the classroom. A high school student asking five economists to debate inflation is one assignment. A medical student asking five clinicians to debate a case is another. The architecture supports both.

Interesting Standford article that inspired project: https://arxiv.org/abs/2304.03442

Built With

ai-search
cormorant-garamond
ernie
framer-motion
gemini
google-scholar
google-translation
jetbrains-mono
kling-image-generation
lemonfox-tts
medo
playfair-display
react
supabase
tailwindcss
typescript
vite
webpage-content-extract
whisper

Updates

Jonathan SolvesProblems started this project — May 18, 2026 05:05 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.