Inspiration

One year ago, we released our first View in Room 3D feature to allow customer to preview a product before making a purchase. It was a great experience if the customer knows what to look for. However, in some cases, the customer may not know exactly which product would be a good fit for his/her existing room setting and finding that best fit item could be time consuming. Therefore, we thought it would be really helpful if we can extend the current AR feature to intelligently suggest complementary products.

What it does

Simply put, our app allows customers to find, place and visually compare complementary products for an actual furniture piece that the customer has. Let’s say our customer John has a dining table and now he is going purchase a set of dining chairs that go well with the table. The workflow goes like this: John firstly opens up the app and scans the environment, this is to warm up the AR engine by detecting planes/floor, just like what our current View in Room 3D does. Once that’s done, he turns on the screen capture mask by clicking the button on bottom left and then snap a picture by clicking the button on bottom right, making sure the table fits the capture frame to guarantee good object detection result. This snapshot of the table will be sent to our Visual Search endpoint and return back a list of SKUs that were frequently purchased along with the detected dining table. These SKUs will be presented on the screen so that John can view their information and place them in the environment. He can adjust the position and orientation of each SKU right in AR and feel free to walk around to visually assess which one is a good fit.

How I built it

Using SceneKit with ARKit, we made an augmented reality application for iOS 11 that leverages our current 3D model infrastructure. We also made our own server to receive incoming HTTP requests and serve up recommendations base on the SKU’s it gets recommended, as well as allow a user to load up a product via SKU (if it has a gltf model). We also leveraged the GLTF model format, which is a more superior format to what is currently used on the View In Room feature right now. For the recommendations, we are taking a frame and applying a bitmask over the frame to trim down the field of view (and where the products could be focused at). This gets sent to the server as a base64 string as part of a POST parameter. The server will then return a sku that is similar to what it sees, and create UI elements (Dynamically) that represent the top 3 sku’s it saw.

In short, diving into the codebase and seeing what works/doesn’t work were the main driving factors for this challenge day. Starting with the Visual_Search_Helper, I wrote a sandbox script that procured image data from a web link, which was then used to return SKUs resembling that item. Using a SKU from that result set, I loaded up a Product_Model and called a function to load a related items collection. From there, we can then call the 3D model API with one or more of the “related” SKUs to return a 3D model, which we can then send to the iOS app, along with product details (name, price, etc)

Challenges I ran into

Since we didn’t finish the endpoints before leaving on Friday, the backend had to become a skeleton, however it is receiving the base64 string of the image, which can also be saved to the filesystem to show the masking working. Another issue we faced was getting some animations we had baked out working correctly (but those can also be shown as well). Then we had issues loading models from the web but that was resolved very nicely!

Finding usable SKU data in dev SF tables. Finding SKUs that are related to other SKUs that are also associated with 3D models proved difficult. In addition, it was easy to fall into a scenario where an HTTP request was being made inside another GET request, specifically when procuring SKUs via visual search. With more time, this would have definitely been something we could have overcome. It would have been sufficient to call the visual search from iOS directly, and then from there hitting a separate endpoint within our codebase that returned related SKU information.

Accomplishments that I'm proud of

I would say we are proud of everything we achieve so far! From first conceptualize the idea to actually sit down and getting the apps to work. Every steps and contributions are vital to the project. It is hard to nail down just certain aspects that we are proud of the most, but if I must, I would say that seeing all the aspects of development coming together and how well those works together in the final product is the proudest accomplishment/moment for us!

What I learned

All our team members come with different skillsets and we all feel fulfilling as we expanded our comfort zone to new cool areas. I personally learned and practiced the basics of iOS development and was able to adapt our app's UI to artist's drawing. I also made a toggle button to turn on/off the screen capture mask. Our other team members acquired skills such as integrating a mask image into the system to only take image content within the mask frame. We also used the newest GLTF 3D model format to ensure our products look good in the AR environment.

What's next for Context-based Recommendation in AR

We built this app for proof-of-concept purposes so certainly there are many aspects to improve on. First, we want to integrate the back-end code to Wayfair’s system so that the whole pipeline can run appropriately. Second, we want to improve the user experience by adding more relevant SKU information such as price and product description. Third, we are also going to write some code to enable real-time object detection, so that the app can intelligently suggest appropriate positions to place the recommended products.

Share this project:

Updates