Inspiration
Advertising has always been a point of tension between streaming platforms and their users. Show too many ads, and viewers become frustrated; show too few, and platforms lose revenue. Mid-show ads are especially disruptive — they break immersion, interrupt dramatic moments, and feel abrupt. Pre-roll ads don't fare much better; just like moviegoers who arrive 15–20 minutes late to skip trailers, viewers have learned to tune them out.
This made us ask: what if ads didn't interrupt the show at all? What if they blended naturally into the viewing experience — personalized, unobtrusive, and embedded right into the background of the scene? What if a coffee mug on a table or a product on a shelf could quietly become a shoppable ad?
What it does
BackstageCommercials is a framework for embedding personalized, shoppable product placements directly into the backgrounds of movies and TV shows. Instead of traditional ad breaks, it inserts AI-generated products into scenes — on tables, shelves, countertops — so they look like they naturally belong there.
Each inserted product comes with a subtle on-screen pop-up linking to its Amazon product page. With one click or a voice command, a NovaAct agent adds the item to the viewer's Amazon cart. Viewers can also ask the agent to find any item they spot on screen — say, "Find this tablecloth on Amazon" — and Nova will locate a visually similar product and add it to their wishlist. No more screenshotting, reverse-searching with Google Lens, and manually browsing results.
How we built it
Our pipeline follows four stages: scene selection → product placement → frame propagation → shopping agents.
Scene selection: We use OpenCV to detect scene cuts and rank candidate frames based on minimal background motion. An Amazon Nova model then picks the best starting frame for product insertion.
Product placement: An adjuster Nova model predicts the bounding box — XY coordinates, width, and height — for placing the product on the selected frame. A second critic Nova model evaluates whether the placement is physically plausible. If the product appears to float or collide unnaturally, the critic rejects it and the adjuster tries again.
Once a valid bounding box is confirmed, we send the frame, bounding box, and reference product image to the FLUX Kontext model with Finegrain's product-placement LoRA adapter, which blends the product photorealistically into the scene.
Frame propagation: The bounding box is carried across subsequent frames in the scene cut. We use YOLOv8 segmentation to mask moving foreground objects (actors, props) and composite them back over the inserted product — creating the illusion that the product truly exists behind the actors in the original footage.
Shopping agents: Two Nova agents work together via Flask API endpoints. Amazon Nova 2 Lite acts as the reasoning agent — it identifies objects, extracts descriptions, and searches the web for matching products. NovaAct is the UI agent — it opens a browser and adds the selected item to the user's Amazon cart or wishlist.
Challenges we ran into
Compute limits: FLUX at full precision requires ~24 GB of VRAM, which was too costly for our setup. We solved this by quantizing the model to 4-bit, making inference practical on limited hardware.
Generation speed: Product insertion on a single frame took roughly 3 minutes, making per-frame generation across a full cut (30–60 FPS) unrealistic. Our workaround was to generate the product only on the first frame and propagate the placement across the cut, using YOLOv8 segmentation to keep foreground objects layered correctly.
Scene quality: Product placement only looks convincing when the background is stable and there's a believable surface to place the object on. Fast motion, camera movement, and difficult angles made realistic insertion challenging in many clips.
Accomplishments that we're proud of
We built a working end-to-end pipeline — from raw video input to a shoppable, photorealistic product placement — in a single hackathon. The adjuster-critic loop for placement validation, the foreground masking for frame propagation, and the seamless one-click shopping flow through Nova agents all came together into a cohesive demo. We're especially proud of how natural the inserted products look and how frictionless the shopping experience feels.
What we learned
We learned how to chain multiple AI models into a cooperative pipeline — using one model to propose, another to critique, and a third to generate. We gained hands-on experience with FLUX Kontext and LoRA adapters for controlled image generation, YOLOv8 for real-time segmentation, and the Amazon Nova agent ecosystem for building browser-based automation. We also learned that 4-bit quantization can make large generative models surprisingly practical without major quality loss.
What's next for BackstageCommercials
2D to 3D: Our current pipeline works on flat surfaces in mostly static scenes. The next step is upgrading to 3D-aware product insertion that can handle camera motion, perspective shifts, and dynamic environments — keeping placements physically believable even as the scene moves. This would unlock product placement in a far wider range of content and bring us closer to a production-ready system.
Built With
- amazon-nova-2-lite
- amazon-novaact
- amazon-web-services
- finegrain-product-placement-lora
- flask
- flux-kontext
- javascript
- lucide-react
- npm
- opencv
- python
- react
- ultralytics
- vite
- yolov8
Log in or sign up for Devpost to join the conversation.