What it does

Our project adds an extra layer of search using OpenAI Clip to increase the relevancy of the results parsed by the search bar. We implemented an algorithm that picks 10 evenly spaced out frames from every video that is uploaded and stores those frames as vectors in Pinecone. Using OpenAI's clip model, we were able to set up the semantic search by making it so that, in theory, a user is able to type a prompt into the search bar and the results are parsed by searching through the videos, (not just captions and hashtags that were attached with the video when it was uploaded). This feature is associated with the search bar component of TikTok which is why we feel that it is adaptable to both the TikTok website and the app. Based on how TikTok is structured, this can either be merged with the current Search API or can become its own GraphQL component to be as modular as possible.

How we built it

We built LockedIn using FastAPI, which was dockerized and host it on an ECR registry on AWS, and use the registry as a template to write a Lambda function to create a Rest API using AWS's API gateway. This API engages with OpenAI clip, our Pinecone vector cluster and AWS S3. S3 is the object storage layer which acts like the storage that TikTok actually uses for storing and distributing content. Pinecone stores the frames and vectors that will be fed to OpenAI Clip when a user tries to search through TIkTok. We also implemented a basic frontend using HTML and CSS to try and make the system design close to TikTok, and maybe show the feasibility of Semantic search.

Challenges we ran into

Some of the major challenges we ran into was navigating through AWS as it was the first time that all of us used its services. Setting up a separate user to stand up the S3 cluster was tough, and connecting it with FastAPI took us quite a bit of time. Navigating through the AWS IAM console and testing through policies for allowing FastAPI to upload and download files posed a huge challenge to us.

What we learned

Working with FastAPI, deploying the API onto API gateway, standing up the s3 cluster and working with AWS IAM was a huge learning experience for all of us. This project really allowed everyone to dive deep into backend development and seeing it all come alive was a rewarding experience! This is also the first time most of us worked with a vector database at all, and it was quite interesting to see how it fit in context to our solution.

What's next for LockedIn

Maybe setting up a CDN via AWS CloudFront onto the s3 cluster would be a likely next step to make the Semantic search seem more representative of TikTok's CDN setup and show its feasibility off would be a good start. As AI advances going forward, using a better tokenizer would show steady improvements in search accuracy and proportionally improve TikTok's UX!

Built With

Share this project:

Updates