## Inspiration
Buying your first home in Victoria means juggling 7+ government websites — DELWP for planning zones, Crime Statistics Agency for safety data, the State Revenue Office for stamp duty and grants, the Valuer-General for market prices, and more. Most first-home buyers spend weeks piecing this together, often making decisions on incomplete information or paying a buyer's agent $3,000+ just to get basic answers.
I wanted to build something that felt like having a knowledgeable friend who had already looked everything up — someone you could just talk to, show a listing photo to, and get a full picture in seconds.
What it does
Vesta is a conversational AI property research tool built specifically for Victorian home buyers. Every feature is designed around one idea: the buyer should never have to leave the conversation.
Upload a real estate listing photo Nova Pro reads the address from the image, extracts the asking price, assesses the property condition, lists visible issues (water stains, cracks, outdated fittings), and identifies which rooms are visible. Nova 2 Multimodal Embeddings simultaneously classifies the architectural style against 8 Melbourne archetypes in a shared 256-dimensional space. If the style is Victorian Terrace, Federation, or Interwar Bungalow, Vesta automatically checks Heritage Overlay restrictions before the user even asks, surfacing a potentially costly constraint at the exact moment it matters.
Tell Vesta about yourself "I'm a single mum earning $85k with $60k saved" → Vesta extracts these facts silently and remembers them for the entire session. Your profile appears as an animated badge strip in the chat header and a full card in the sidebar. Every subsequent answer — grants, stamp duty, LMI thresholds, mortgage repayments — is personalised to your situation. Nova Pro is instructed never to re-ask for facts it already knows.
Ask anything in plain English
- "Is this suburb safe for kids?" → crime percentile + nearby schools from live government data
- "Can I build a granny flat here?" → planning zone + overlays from Vicmap in real-time
- "What government help can I get?" → semantic search across FHOG, FHBG, VHF, FHSS, Help to Buy schemes
- "What would my repayments be?" → Monthly repayment $M$ is calculated using the standard amortisation formula with Victorian stamp duty, LMI, and FHOG offsets applied to the principal:
$$ M = P \cdot \frac{r(1+r)^n}{(1+r)^n - 1} $$
where $P$ is the net loan principal after grants, $r = \text{annual rate} / 12$ is the monthly interest rate, and $n$ is the total number of monthly payments.
Compare two properties side-by-side Load Property A and Property B. Vesta compares planning zone, overlays, bushfire risk, crime percentile, median price, 1-year growth, and rental yield with automatic winner highlighting. One button sends both to Nova Pro for a plain-English synthesis of which is the stronger buy and why, factoring in your saved profile.
Upload a Section 32 / Vendor's Statement
Drop in a PDF (up to 200 pages). Nova Pro reads the full legal text and returns only the red flags: easements, caveats, body corporate special levies above $3k/year, building defects, unresolved disputes, council notices, grouped by severity (High / Medium) with exact page references and plain-English explanations.
Always closes with a reminder to have a conveyancer review before signing.
Voice — speak and listen Ask by voice via the browser's Web Speech API. Vesta's answer is compressed by Nova 2 Lite to ≤2 spoken sentences, enriched with SSML breath pauses and emphasis markers, then synthesised as Australian-accented MP3 via Amazon Polly Neural (Olivia voice). The whole pipeline from full agent response to audio takes under two seconds.
## How we built it
Nova Pro — ReAct Agent (LangGraph) Every property or policy question goes through a Reason → Act → Observe → Repeat loop. Nova Pro autonomously decides which of 9 live data tools to call and in what order, then synthesises a plain-English answer streamed back token-by-token via SSE. The frontend renders tool activity in real-time as an animated timeline so users can watch Nova Pro think.
Nova 2 Lite — Intent Router
Every message is first classified by Nova 2 Lite (~200ms) as chat, property, or policy. Casual conversation ("Hi", "Thanks", "What can you do?") is answered directly by Lite with no tool overhead. Complex queries are escalated to the full Nova Pro agent. This two-tier design makes a 9-tool reasoning agent feel snappy.
Nova 2 Multimodal Embeddings — Cross-Modal Style Classification At startup, we embed 8 Melbourne architectural archetype descriptions as 256-dim text vectors and cache them. When a user uploads a listing photo, we embed the image into the same shared space and compute cosine similarity against all 8 archetypes:
$$ \text{similarity}(I, A_k) = \frac{\vec{v}I \cdot \vec{v}{A_k}}{|\vec{v}I| \cdot |\vec{v}{A_k}|} $$
where $\vec{v}I \in \mathbb{R}^{256}$ is the image embedding and $\vec{v}{A_k}$ is the pre-cached text embedding for archetype $k$. The predicted style is:
$$ \hat{k} = \arg\max_{k \in {1,\ldots,8}} \text{similarity}(I, A_k) $$
The best match with confidence score $\text{similarity}(I, A_{\hat{k}}) \in [0, 1]$ is returned as a StyleBadge. Heritage archetypes (Victorian Terrace, Federation, Interwar Bungalow) automatically prepend a get_overlays() call to the agent queue before the user asks a single question.
Nova 2 Lite — Voice Digest Long agent responses are distilled to ≤2 sentences and under 80 words by Nova 2 Lite before being passed to Amazon Polly. This keeps voice output natural and focused rather than reading out a wall of text.
Nova Pro — Section 32 Red-Flag Extraction PDF text is extracted page-by-page (up to 200 pages, 200K characters) and passed to Nova Pro with a strict mandate: identify red flags only, not a full summary. Output is structured JSON with flag title, page number, risk level, and a plain-English explanation, then reformatted into a chat message grouped by severity.
Data tools (9 total):
- Vicmap ESRI ArcGIS for planning zones, overlays (Heritage, Flood, Bushfire, DDO), and bushfire prone areas
- Crime Statistics Agency VIC (Dec 2024) for suburb crime rates and state percentiles
- Victorian Valuer-General for median price, rental yield, clearance rate, and days on market
- OpenStreetMap Overpass for schools, transport, supermarkets, and hospitals via real-time radius search
- Amazon Titan Embed Text v2 with RAG for government policy semantic search via cosine similarity over a pre-indexed vector store. The top-$k$ retrieved chunks ranked by $\text{similarity}(q, d_i)$ are injected into Nova Pro's context window before it generates a policy answer.
- Built-in VIC rules calculator for stamp duty, LMI, FHOG, repayment schedules, and FHB concessions
Stack: Python 3.12 · FastAPI · LangGraph · React 18 · TypeScript · Vite · Tailwind CSS · AWS Bedrock · Amazon Polly Neural · pypdf · PostgreSQL · Google Maps Geocoding API · OpenStreetMap Overpass API · Vicmap ESRI ArcGIS
## Challenges we ran into
Nova 2 Multimodal Embeddings API format
This model does not use the standard messages schema. It requires a custom taskType / singleEmbeddingParams request body that is not well-documented. Cross-modal retrieval — querying with an image against text-embedded archetypes — also required careful prompt engineering of the archetype descriptions to make the shared vector space meaningful. Our first attempts produced near-random similarity scores; the breakthrough was grounding each description in concrete visual details (brick colour, roofline shape, window style) rather than historical facts.
SSE streaming with interleaved event types
Producing tool_start, tool_end, profile_update, and text_delta events through a single async SSE stream while the LangGraph ReAct loop is still running required careful coordination. The agent's astream_events API emits events at different granularities, and mapping these to our typed SSE schema without dropping or duplicating events took significant iteration.
Heritage auto-trigger UX Deciding when to silently prepend tool calls and when to surface the StyleBadge required many iterations. Too aggressive and it felt like the app was doing things without explanation. Too subtle and the feature was invisible. We settled on always showing the StyleBadge, and if heritage is triggered, generating a brief Nova 2 Lite message ("I spotted a Federation-era home — automatically checking Heritage Overlay") before the agent's full response arrives.
Real-time Victorian government data Many Victorian government APIs are undocumented ESRI REST endpoints. Reverse-engineering the correct layer IDs, spatial reference parameters, and query geometry formats from DELWP's public map viewer took significant time, and several layers return subtly different schema between endpoints.
Session profile synchronisation
The backend extracts profile facts from every message using regex pattern matching and emits SSE profile_update events. The frontend must compare incoming profile state against previous state, trigger fresh-field animations for exactly 1.2 seconds, then clear them, all while other SSE events are still arriving. Getting this state machine right without race conditions required careful use of Zustand's setState diffing.
## Accomplishments that we're proud of
- Zero-shot cross-modal style classification with no fine-tuning. Eight plain-English archetype descriptions are the entire "training set", yet the cosine similarity scores are meaningfully discriminative.
- Autonomous heritage detection. Vesta surfaces Heritage Overlay restrictions the moment a photo is uploaded, before the user knows to ask.
- Full document analysis pipeline. A 200-page legal PDF reduced to actionable red flags with page references, in seconds.
- Sub-200ms intent routing that makes a 9-tool reasoning agent feel fast for everyday conversation.
- Persistent buyer profile. Vesta remembers income, deposit, family situation, and citizenship from natural conversation and applies them to every subsequent answer, silently.
- Full voice loop. Speak a question, hear a 2-sentence Australian-accented answer, without ever touching the keyboard.
- A complete full-stack app covering multimodal vision, cross-modal embeddings, ReAct agent, voice pipeline, A/B comparison, and PDF analysis, built end-to-end during the hackathon.
## What we learned
- Amazon Nova's multimodal embedding space is genuinely cross-modal. Matching property photos to text-described style archetypes via cosine similarity works, and the key is grounding text descriptions in visual specifics rather than abstract historical context.
- Nova 2 Lite is fast enough to use as a pre-pass on every single message without users perceiving the latency. The two-tier routing pattern is worth the architecture complexity.
- The ReAct pattern with Nova Pro handles genuinely open-ended, multi-step research tasks without needing a rigid workflow graph. Tool selection order emerges from reasoning, not from hardcoded sequences.
- Structured JSON output from Nova Pro (for image analysis and PDF red flags) is reliable enough to build a production UI on, as long as you include explicit format examples in the prompt.
What's next for Vesta
- Nova 2 Sonic integration. Replace the current Polly pipeline with full bidirectional speech-to-speech once the boto3 streaming API becomes available, enabling a truly conversational voice mode.
- Inspection checklist generator. Nova Pro analyses listing photos room-by-room and generates a tailored checklist of things to verify at the inspection (e.g. "check the age of that hot water system").
- Expand beyond Victoria. NSW and QLD planning data, national grant schemes, cross-state stamp duty comparison.
- Comparable sales map. Interactive map with Nova Pro narrating why similar properties sold for what they did, grounded in Valuer-General transaction records.
- Saved sessions. Persist buyer profile and property shortlist across sessions with user accounts.
Built With
- amazon-polly-neural
- aws-bedrock
- fastapi
- google-maps-geocoding-api
- langgraph
- openstreetmap-overpass-api
- postgresql
- pypdf
- python
- react
- tailwind
- typescript
- vicmap-esri-arcgis


Log in or sign up for Devpost to join the conversation.