MediReels

Technical Diagram

MediReels: Revolutionizing Healthcare Communication

MediReels is a revolutionary healthcare communication platform that simplifies complex medical information through engaging short videos. By transforming dense text and research into digestible content, MediReels empowers patients, clinicians, and educators to understand and navigate the world of healthcare.

Inspiration: The Problem: Complexity in Healthcare Information

The problem that we tackled is the complexity of digesting healthcare information in the modern day where people have a very low attention span and like to watch short reels. Medical content usually contains technical jargon and complex language that creates barriers to understanding. A simple web search usually results in information overload since there are hundreds of relevant articles. The average person does not have a medical professional with them who can explain these concepts concisely and engagingly.

What it does

MediReels is a revolutionary healthcare communication platform that simplifies complex medical information through engaging short videos. By transforming dense text and research into digestible content, MediReels empowers patients, students, and educators to understand and navigate the world of healthcare.

Problem Statement

In the rapidly growing digital education landscape, content creators face the challenge of producing both engaging and informative material that resonates with audiences across platforms. Short-form videos, such as reels, are perfect for grabbing attention but often lack the depth needed for educational content. On the other hand, long-form content like podcasts can provide the necessary depth but struggle to maintain audience engagement.

Educators and influencers must consistently create high-quality content across formats while managing time, costs, and technical expertise. Traditional scriptwriting, voiceover recording, and editing methods are time-consuming and expensive. Hiring voice actors or relying on robotic-sounding text-to-speech services either increases costs or reduces content quality.

MediReels solves these issues by providing an AI-powered platform that allows users to generate educational short-form content like reels, and long-form content like podcasts, with ease. By automating script generation, offering natural-sounding AI voiceovers, and simplifying editing, MediReels enables creators to produce professional-grade content quickly and efficiently.

Business use case

The target customers for MediReels include a wide range of individuals and organizations involved in content creation, education, and entertainment.

Educators & E-learning Platforms Educators and online learning platforms are always looking for ways to engage students and make learning more accessible. With MediReels, they can easily create short, dynamic reels for quick lessons and deeper, more detailed podcasts for long-form content. The platform automates content creation, allowing teachers to focus on delivering knowledge while maintaining high-quality production. This flexibility and time-saving approach make MediReels ideal for schools, universities, and online education providers looking to expand their digital presence.
Influencers & Content Creators Influencers and social media content creators thrive on engaging their audience through diverse formats across platforms like Instagram, TikTok, and YouTube. MediReels provides an easy way of storytelling, helping influencers grow their audience and increase monetization opportunities. The automation of voiceovers and editing allows creators to consistently produce high-quality content while focusing on expanding their brand and business.
Podcast Hosts For both aspiring and established podcast hosts, MediReels simplifies the podcast production process. With its natural AI-driven voiceovers and seamless editing, hosts can create professional-sounding podcasts without needing technical expertise or a large production budget.
Corporate Training Corporate training departments need to create effective, engaging learning materials for employees. MediReels helps them develop both quick, bite-sized reels for easy-to-digest training and long-form podcasts for more detailed topics.
Nonprofits & NGOs Nonprofits and NGOs often need to spread awareness about important social causes. MediReels allows them to produce impactful, short-form reels for social media campaigns.
Digital Marketing Agencies Digital marketing agencies are constantly seeking ways to create engaging, multi-format content for their clients. With MediReels, agencies can offer high-quality short-form reels and long-form podcasts that tell a brand's story in diverse, attention-grabbing ways.

How we built it

We wrote the whole frontend using Streamlit and the backend in FastAPI. It is deployed in Hugging Face Spaces. We use Tavily as a search engine and mistral-small-latest text generation

Information Gathering

You can choose a topic of your interest in healthcare domain or enter a query. If the query is not related to healthcare, the system informs the user that it is an invalid query. (Guardrailing 101)
A list of trending articles about their topic of interest are shown to you, with the links to the website and a short summary to make it easier for the you to choose an article you like.
After choosing a trending topic you can decide to either explore long-form content (Podcasts) or short-form content related to the trending topic.

Short-Form Content

Once the you select short form content, magic happens and tada! you get a reel for your social media.
Okay, not magic, but AI! We use Mistral Large to generate a list of high quality scripts with captions for social media. Once the you selects the script you like, we use Edge Text-to-Speech package to generate audio as well as captions with time stamps
Next, we prompt Mistral Large with the captions and time stamps to generate better prompts to generate high quality images. You heard that right, a prompt to generate a better prompt.
All the prompts are sent async to our HuggingFace Inference Endpoints (Thank you sponsors), to generate high quality images. We are hosting STOA Flux AI model behind these endpoints.
Now we have images, captions, and audio. We stitch everything together using moviepy package et voila! You have your reel to up your social media game!

Long Form Content

If you prefer Podcasts over reels, firstly, good choice.
Once you select the topic of your choice, we use Mistral Large to generate a conversational script with two hosts that keeps you engaged. This script is passed to Edge TTS again to generate voices for our hosts.
We stitch both the voices together according to the timestamps to make a seamless and engaging podcast audio.

If you want to iterate over the output, you can certainly do that by regenerating the workflow.

Challenges we ran into

Situation	Task	Action	Result
Scraping the relevant topics	We needed to find the appropriate topics from different sources related to the topic and provide a glimpse to the user	We achieved this using Tavily API which does a search and finds the top relevant contents with source URL, raw content etc	We got the top relevant topics without doing much scraping and data cleaning steps
Structuring the output for the script generation for reels	We have to generate the script for short-form content like reels in a particular format such that it has a topic followed by the script within 150 words for a short-form content followed by an open-ended question for the user to explore other reels or podcasts	we tackled it using pydantic and using langchain with structured output, we generate an output of the pydantic base schema that is later used for the reel generation	we generated the desired output format we want to create the reel
Generation of the Podcast	We needed different voices for different users and also identify the voices of the users as male or female to assign relevant voices to them for an engaging podcast	We used edge-tts with varying text and a simple gender classifier for relevant voices from the edge-tts package. Finally, we gave different guests and hosts different voices	We generated an engaging podcast with different voices for guests and hosts
Stitching the podcast together	We needed to stitch the podcast together with intro music followed by the intro, host, and guest dialogues alternatively and finally followed by an outro and an outro music	We achieved this using edge-tts and stitching it using the pydub package to render the audio segments and then append it to the mp3 file to get the desired output	We get a perfectly stitched podcast which feels professional with pauses, tonal changes, different voices, intro, and outro music
Using a lot of different models in the pipeline	We had to use a lot of different models in the pipeline starting from mistral to handle all the general requests related to text in different stages, prompts and then using flux diffusion models for generating images	For handling all the models in the pipeline we streamlined the process using a frontend and backend architecture using fast API for the backend and streamlit as the frontend. We used different modules for different features of the app and used fast API to retrieve the request and show it up on the frontend	It resulted in a streamline pipeline with the modules handled carefully which is useful for debugging errors in the code

Accomplishments that we're proud of

Seamless Integration of Multiple Libraries and Tools: We successfully integrated several cutting-edge technologies like Langchain, Mistral, Edge-TTS, Hugging Face spaces, and endpoints to create a robust AI-powered tool for generating both short-form content (reels) and long-form podcasts.
Real-time, Natural Text-to-Speech: Using Edge-TTS with asyncio, we generated lifelike, natural-sounding voices for both hosts and guests, making the podcast content feel more engaging and professional.
Scalable Backend with FastAPI: We utilized FastAPI for building a fast, scalable, and efficient backend, ensuring that users could quickly generate their educational content without long wait times.
Enhanced Video Editing Capabilities: Through MoviePy and Pydub, we were able to seamlessly integrate and edit audio and video content, producing polished educational reels that are visually and auditorily compelling.
High-Quality Image Generation: By leveraging Flux AI, we incorporated high-quality image generation capabilities into our project, allowing users to create visually appealing content that enhances engagement and comprehension.
Powerful Search Functionality: With Tavily Search, we implemented an intuitive and efficient search feature that enables users to find relevant educational materials quickly, making content discovery seamless.

What we learned

Multimodal content generation with LLMs and VLMs.
Using tools like search engines with LLMs
Complex stitching of images and audio and srt files to generate a video using moviepy
edge-tts and pydub for creating engaging podcasts we learned to use asyncio for handling complex changing audios and stitch them to make a podcast

What's next for MediReels

Video generation using the advancements in text-to-video models
Research more about the market and do a pilot study in pairs metropolitan area
Find a market fit for our product
Reach to more and more people in content creation, education and social media

Built With

asyncio
edge-tts
fastapi
flux-ai
github
google-cloud
huggingface
langchain
mistral
moviepy
pydub
python
streamlit
tavily

Submitted to

Mistral AI x Alan Healthcare Hackathon

Created by

Designed frontend in streamlit; created scripts from raw articles with langchain

Qasim Khan
I study Big Data Management and Analytics at CentraleSupelec in Paris. I love GenAI, Machine Learning, and Databases.
Arijit Samal
Jyothish Kumar C G
Roshan Velpula

Updates

Qasim Khan started this project — Oct 13, 2024 05:00 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.