Inspiration

The inspiration behind Bloggy came from a desire to make local news and discussions more accessible, engaging, and inclusive for Tunisians. Many people feel disconnected from traditional news sources, either due to language barriers or the overwhelming nature of global content. We wanted to create a platform that automatically turns local Tunisian news and radio broadcasts into easy-to-read, engaging blogs written in the Tunisian dialect. This bridges the gap between formal media and everyday citizens, making news consumption feel more personal and relatable.

What it does

Bloggy listens to local radio stations and scrapes popular Tunisian news websites to automatically generate blog posts in Tunisian Arabic using AI. These blogs summarize and contextualize news in a user-friendly way, enriched with relevant content. The app offers both real-time updates from radios and deeper insights from batch-processed news articles, all accessible through a clean, interactive dashboard.

How we built it

We used a two-part architecture:

  • Batch Processing: Scrapers gather articles from sites like mosaiquefm.net, jawharafm.net, and others. These go through a pipeline that includes file accumulation, storage in Hadoop, and transformation via Apache Spark into MongoDB (for blog content) and PostgreSQL (for metadata). IDs are passed via Redpanda Kafka, triggering a RAG (retrieval-augmented generation) agent that summarizes and embeds content using LLMs and stores them in a vector DB (Qdrant).
  • Streaming Processing: Radio streams are captured and transcribed using the Vosk model. A backend system manages which programs to record and when, saving sessions and transcripts. After a program ends, transcripts are interpreted using LLMs to generate news pieces, which are later turned into blog posts enriched by relevant past content.

The backend also powers a frontend interface where users can view real-time transcripts, insights, and published blogs.

Challenges we ran into

  • Handling different data formats from various news websites and radio streams.
  • Synchronizing streaming and batch processing pipelines in a coherent way.
  • Scaling transcription and summarization efficiently without breaking real-time constraints.
  • Ensuring dialect generation quality with Tunisian Arabic, which is not always well-supported by out-of-the-box models.
  • Orchestrating the complex flow between multiple data sources, pipelines, and services with minimal delay and high reliability.

Accomplishments that we're proud of

  • Successfully built a scalable dual-pipeline architecture that merges real-time and batch data.
  • Automated the entire content creation cycle from raw news to enriched blog post.
  • Enabled generation of blogs in Tunisian dialect, making local news more digestible for everyone.
  • Deployed an interactive frontend showing insights and blogs as they’re being created.

What we learned

  • How to balance real-time and batch processing architectures effectively.
  • Deeper insights into orchestrating RAG pipelines and embedding systems.
  • How to integrate multiple services (Kafka, Spark, Hadoop, MongoDB, Vosk, LLMs) to form a coherent, production-grade pipeline.
  • The importance of user-focused content design for accessibility and engagement.

What's next for bloggy

  • Improving dialect customization by fine-tuning models on Tunisian Arabic.
  • Introducing audio blog versions using TTS for accessibility.
  • Expanding to cover more radio stations and regional news websites.
  • Launching a mobile version of the platform.
  • Adding community features allowing users to comment, react, and submit stories.

Bloggy aims to be a local-first AI tool that empowers citizens to stay informed in the language and format they relate to most.

📽️ Demo: Click here to view the demo

❗❗ IMPORTANT NOTICE ❗❗

🚨 Due to the heavy infrastructure requirements, our project cannot be deployed easily.

🔐 Additionally, because of the intensive work involved, we decided to keep our GitHub repository private.

💻 However, we’re fully prepared to show the code during the presentation day.

Share this project:

Updates