Story: YouTube Transcript Fetcher (Devvit Module) 📌 About the Project The YouTube Transcript Fetcher module is part of a broader Devvit-based YouTube Analyzer designed to extract, analyze, and visualize video transcripts directly within Reddit’s developer ecosystem. This component focuses on retrieving structured transcript data from YouTube videos, enabling downstream tasks like semantic analysis, toxicity detection, and civic tech moderation. It was built to support real-time, reproducible evaluation of video content, especially in contexts where transparency and auditability are critical—such as civic discourse, misinformation tracing, and platform governance.
Inspiration
We were inspired by the lack of accessible, structured transcript data for YouTube videos in moderation workflows. While YouTube offers auto-generated captions, integrating them into civic tech pipelines requires robust extraction and formatting tools. Reddit’s Devvit platform offered a unique opportunity to embed this capability directly into subreddit interfaces, allowing moderators and researchers to evaluate video content without leaving the platform.
What it does
The YouTube Transcript Fetcher module is a TypeScript utility designed to retrieve and structure transcript data from YouTube videos. It powers the transcript analysis layer of the YouTube Analyzer Devvit app, enabling Reddit-native moderation and semantic evaluation of video content. Core Functions: Fetches transcripts from YouTube videos using the youtube-transcript-api, supporting both auto-generated and manually uploaded captions. Filters by language, prioritizing English transcripts for civic tech and moderation workflows. Structures transcript segments with timestamps, enabling alignment with LLM-based semantic scoring and toxicity detection. Handles missing or malformed transcripts gracefully, with fallback logic to ensure robustness. Prepares transcript data for downstream analysis, including misinformation tracing, sentiment evaluation, and reproducible moderation decisions. This module acts as a bridge between YouTube’s raw caption data and Devvit-powered Reddit interfaces, allowing moderators and researchers to evaluate video content directly within subreddit dashboards.
How we built it
The module was implemented in TypeScript and designed to be compatible with Devvit’s sandboxed execution model. Key features include: Transcript retrieval via YouTubeTranscriptApi Support for both auto-generated and manually uploaded captions Language filtering and segment structuring Integration-ready output for LLM-based semantic evaluation We structured the code to allow easy reuse across other modules, including toxicity scoring and misinformation tracing. The transcript fetcher acts as a foundational layer for higher-order moderation logic.
Challenges we ran into
Transcript availability varies wildly—some videos lack captions, others have multiple conflicting versions. Rate limits and API quirks required fallback logic and caching strategies. Devvit’s deployment flow was under-documented, so we reverse-engineered from starter kits. Segment alignment for LLM evaluation was tricky—especially for auto-generated transcripts with inconsistent timing. Despite these challenges, we delivered a robust module that powers transcript-aware moderation and analysis.
Accomplishments that we're proud of
Built a transcript fetcher compatible with Reddit’s Devvit platform, enabling seamless integration of YouTube video analysis into subreddit workflows. Successfully retrieved and structured transcripts from diverse YouTube videos, including auto-generated and manually uploaded captions. Engineered language filtering and segment alignment logic, ensuring that transcript data could be used for downstream semantic evaluation and moderation. Designed the module for modular reuse, allowing it to plug into other civic tech pipelines like toxicity scoring, misinformation tracing, and CoreML-based mobile moderation. Overcame Devvit sandbox constraints by refactoring for lightweight execution and API compatibility. Created fallback logic for missing or malformed transcripts, improving robustness and reliability across edge cases. Enabled segment-level scoring using LLMs, laying the foundation for reproducible, auditable moderation decisions. Aligned transcript segments with semantic scoring functions, using weighted consensus across multiple LLMs to surface toxicity and misinformation flags. This module not only works—it scales, adapts, and empowers civic tech moderation in real-time.
What we learned
YouTube’s transcript APIs are nuanced—handling auto-generated vs manual captions requires careful logic. Devvit’s execution environment demands modular, lightweight code—we had to refactor for sandbox compatibility. Transcript segmentation is critical for aligning LLM evaluations with video timelines. Language metadata matters—we added logic to prioritize English transcripts and flag missing data. We also learned how to bridge Reddit’s Devvit tooling with external APIs, a skill that will serve future civic tech integrations.
What's next for YouTube Content Analyzer
This module lays the groundwork for cross-platform moderation pipelines. Next steps include: Embedding transcript fetcher into subreddit moderation dashboards Integrating CoreML classifiers for mobile evaluation Publishing our architecture in AACL-IJCNLP 2026 and SRAI Catalyst The YouTube Transcript Fetcher is more than a utility—it’s a civic tech enabler, helping platforms move toward transparent, reproducible content evaluation.
Built With
- and
- devvit
- express?optimized
- for
- react
- reddit?s
- sandboxed
- tailwind-css
- typescript
- vite
- youtubetranscriptapi
Log in or sign up for Devpost to join the conversation.