Upload Videos
Video Management
Accessibility Editing
User View
Tech Stack
User Persona Tiktok User
User Persona Tiktok Creator
User Journey : Tiktok User
User Journey : Tiktok Creator

Track 2: Inspiring Creative with Generative AI

In the scenarios of creating and consuming streaming media content, generative AI technologies can be utilised for content optimisation, information extraction and style transformation, to refine content across various media platforms. The transformative power of Generative AI can unite communities by inclusively addressing diverse consumption needs through various communication mediums and beyond. With these technologies, we can cater to the preferences of diverse audiences, as well as facilitate creators in producing higher quality content more efficiently.

Team Problem Statement

How can TikTok content be accessible and inclusive for people with visual impairments, while ensuring creative and freedom of expressions by creators?

Background

Many TikTok videos rely on viral audio that does not provide sufficient context, creating barriers for visually impaired viewers and hindering their ability to fully enjoy TikTok's immersive and engaging nature.

Introducing AudioSight, a revolutionary project designed to make TikTok more inclusive for the visually impaired.

AudioSight automatically generates audio narration for videos where visual context is essential (e.g. This Viral TikTok Video). Our solution harnesses advanced generative AI concepts to ensure accuracy and accessibility:

Accessibility Features

Keyframe Recognition: By utilising structural similarity comparisons, we optimise and extract keyframes to identify crucial moments in the video.
Scene Detection and Explanation: Leveraging large language models (LLMs), we explain scenes through detailed descriptions, providing accurate and comprehensive audio transcripts for enhanced accessibility.

Please refer to images in the image carousel above if the images are not loading.

TikTok Creator User Persona

TikTok Viewer User Persona

Accessible Content : Visually impaired audiences can seamlessly toggle to the accessibility feature through distinctive gestures, in addition to existing TikTok gesture interactions. This non-intrusive method enables them to initially enjoy the content as intended by the creator. Subsequently, they can utilize the accessibility tool for enhanced comprehension and clarity. This dual approach ensures an authentic viewing experience while providing a means of contextual explanation to better understand the visual context of the content.

Inclusive Content Creation: Content creators can leverage the AI-enabled scene detection feature in their content creation process. This feature samples unique frames from the content and provides LLM-generated contextual explanations. These explanations save creators time by offering a solid starting point for contextual descriptions, requiring only minor adjustments. This Human-in-the-Loop approach ensures creators can make final checks to confirm that the explanations accurately reflect the intended portrayal of the content. This ensures that the accessibility toggle feature can be fully enjoyed by visually impaired audiences, enhancing their experience like never before.

Through these innovative techniques, AudioSight transforms video content, making it accessible and enjoyable for everyone.

TikTok Creator User Journey

TikTok Viewer User Journey

Technical Implementation

Full Stack Tech Diagram

Data Collection: We scraped a few videos that suited our content style (mainly travel videos and low caption to no caption videos).
Frontend Development and FastAPI: We used JavaScript frameworks ReactJS and NextJS to build our client-facing application and used FastAPI to connect our Frontend with our middleware and backend. We chose FastAPI as it is a Web Framework for easy building of APIs.
Finetuning: Using Prompt Engineering and sequential prompting techniques, we ensured that our video to speech methodology gave accurate yet succinct responses. This requires the use of LLM APIs such as Open AI's TTS and GPT 4o.
Evaluation: We used a second LLM API (Gemini 1.5 Pro and GPT 3.5 Turbo) to evaluate the accuracy of what was produced.
Backend Support: We used MongoDB which is a NoSQL document-based base that allows vector embedding of information and Amazon S3 as our storage server.

Development Tools

Frontend: ReactJS, NextJS
Middleware: OpenAI API, FastAPI, Gemini API
Backend: Python, MongoDB, Amazon S3

Please view this Google Drive link for higher quality images: Click HERE

Built With

css3
fastapi
gemini
openai
opencv
python
react
streamlit

Submitted to

2024 TikTok TechJam

Created by

I worked on the tiktok clone and front end development using react

jordan lee
find me at jordanleewei.com
Thian Xian Yao Kelvin
jia sheng ooi
Faith Lim
kelvin Thian