Inspiration

Before arriving at the hacker house, we were exploring it on SOON's website with a 3D scan they had made, which induced a 3D itch in the back of our minds. Upon arrival, we realized that while the house was beautiful, but there was no furniture!! We are no designers, but we wanted to see (and get a sense of pricing) what the place could look like in a rustic 80's vibe. So we got building...

What it does

Scan a home with your iPhone (pro model for LiDAR) to produce a 3D representation of the place. Once uploaded, it builds a semantic tree: a normalized, room-segmented view of the scan that converts from raw RoomScan measurements (4×4 transforms, precise coords) to named properties that the LLM can reason about. Walls are broken into free spans (the gaps between doors, windows, and existing furniture). Doors and windows are attached to their parent walls. Detected objects are tagged with per-side clearances in their own local frame, so "left of the table" means the table's left, not world-left. A flood-fill over a 5 cm occupancy grid splits a multi-room scan into a Building -> Room -> (walls, objects, placements) hierarchy. The agent only sees this tree through discovery tools (LIST_ROOMS, INSPECT_ROOM, FIND_NODES), pulling down detail on demand instead of swallowing the full scan every turn. This puts a room representation ~2 k tokens, and is optimized for current llm reasoning.

On top of the tree sits the placement engine. The agent doesn't drop furniture coordinate-by-coordinate; it builds up a design -- a list of assignments like "sofa on wall_G" or "armchair next to the coffee table, left side, 20 cm gap" -- and a single SOLVE_LAYOUT call realizes the whole intent at once. The solver runs greedy + repair: it topologically sorts assignments by dependency, auto-distributes multi-same-side groups (three chairs along the front of a table get spaced at 1/4, 2/4, 3/4 automatically), and scores each candidate position with a cost function. Hard infeasibilities (collisions, out-of-bounds, "wall too short") prune the search; soft costs (off-center placement, yaw deviation, door-swing intrusion, front-edge alignment between siblings, left/right symmetry around a shared anchor) are tiebreakers. When an item can't fit, a repair pass nudges the blocker -- the placed item closest to the failure -- by ±5 to ±30 cm to see if a small shift opens up the slot. Whatever still won't fit comes back with a structured reason (wall_too_short, clearance_too_shallow, side_blocked) so the agent can swap to a smaller item, pick a different wall, or hand the conflict back to the user.

The catalogue displays all the furniture options with live prices. Fetching data from Amazon Berkeley Objects (ABO) Dataset, we were able to get the exact name, size, image, 3D model and purchase link of each product, giving user an access to all the details they need to plan out their future home both visually and financially.

Challenges we ran into

Our final goal was to fuse the LIDAR scan with the RGB camera to snap a Guassian splat to the walls (while ignoring all other "dirty" features like people, random specs, etc). We were having some challenges with rendering it in the browser (would render in Guassian splat engines elsewhere) and ultimately ran out of time.

Accomplishments that we're proud of

We were please with the semantic tree and placement engine.

What we learned

We learned a lot about how to expose semantic information from non-text-based domains in

  • a token efficient way
  • a queriable structure
  • an llm optimized fashion (we pivoted away from exact coordinate representations of all building features as it was too dense and outside of current leading llms RL circuitry)

What's next for Dr. Room

While it was an interesting exercise to make an "AI API" for home design, some RL or proprietary model work would likely be needed to make it truly competitive/ ready for production.

Built With

Share this project:

Updates