Inspiration
Watch live MLB game in stadium or on TV and talk to AI agent to understand more about what's is happening. The agent is responsible to interact in a fun way to have more fan engagement . It uses all data stored as vector embedding in Qdrant. This way we can avoid hallucinations. This has support for variuous languages like English, Hindi, Vietnamese, Spanish
What it does
Turn on camera in webapp , then record a 10 to30 second clip and ask questions to application using your mic. Behind the scenes the video gets processed and then using stored Yankee data for year 2024 this application will respond back to questions asked .Overall a fun engagement with video, audio and AI.
How we built it
Front end component in React, backend components in fastapi, python, llama index , Qdrant and Google API to use Gemini flash-1.5. The whole documentation is present in githun repo [link]https://github.com/abhinav1singhal/talkin-bases
Challenges we ran into
A lot of challenges. 1/ I tried using vector AI studio to process json and create embeddings for MLB data. But it turned out that I will need a enterprise version and contact Google representative to enable json embedding. Therefore I switched to Qdrant and off the shelf product. 2/ creating embedding and collecting data for every game , every player was Really challenging as it's really hard to store everything in vector embeddings and make it useful for RAg. Therefore I ended up categorizing data for Yankees for 2024 year only. As I had limited resources . The scope for entire implementation has to be reduced for MVP product. 3/ prompt creation was a challenging as it's hard to restrict the response on Gemini flash llm so that response does not hallucinate. 4/ There is still some helucination which can be further imporved by adding code to embedd video recording and then get the context from vector Db for more accuracy. 5/ Testing between Iphone and Android applications was truicky, as front end code had to be modified to adapt to both type of operating systems for front end code
Accomplishments that we're proud of
1/ lots of digging of data writing scripts 2/ Able to achieve multi-model RAG system that has json data and video data to be processed and sent as a response. 3/ Deploy the application on google cloud run using gcloud cli commands which is super easy to learn and implement. 4/ dockerize the front end and back end services so that its easy to deploy. 5/ Easy to implement and use google developer api, llama-index , qdrant , fast api to implement multimodel Rag system. 6/ Fun way to interact with Gemini Flash 1.5 with video which gets even better when we use it to get fun answers.
What we learned
1/ Data processing is the Key and storing in vector embedding is super critical. 2/ For every type of usecase, the data needs to be processed differently and dedicated to the use cases. Then choosing vector embedding database is a key aspect. 3/ Choosing apporpriate frame work like llama-index for Vector embeddings storage in qdrant made it super quick. Without having me to write a lot of native code. 4/ Google api to make call with video as parameter and prompt, was super simple and response comes out in no time, The only latency is on network between front end and backend based on region. 5/ There a lot of room for improvement and infact a lot of data can be stored to improve the accuracy. 6/ Learned how easy it is to dockerize the code and using google cli command it can be deployed on google cloud run and be used anywhere desktop, android devide or Iphone.
What's next for Talkin-Bases
1/ Improve the UI-UX, I want to add icons , images and more crispinteraction. 2/ Improve on video prcessing by adding code to embedd video coming as input parameter and then create a better context from QdrantDB 3/ If I can get access to Vertex AI api, then i want to store json directly in google instead of qdrant. Right, now I am unable to do it as it was asking to contact the google sales representative to enable json support. 4/ Fix some of the quirky issues on vietnamese and Mandarin language response. 5/ Add authentication on APIs to secure them and also add rate limiting to the application deployed. 6/Host the front end and backend behind a domain and loadbalancer.

Log in or sign up for Devpost to join the conversation.