Demo: [https://skillsync.space]
Inspiration
I built SkillSync after getting frustrated with how many tutorials I watched but never practiced. Video lessons teach concepts, but mastery comes from deliberate practice: stopping, testing, getting feedback, and continuing this loop. I wanted a tool that makes videos actionable — one that watches the same content we watch, finds teachable moments, and coaches the learner, including how they speak, so learning becomes practice, not passive consumption.
What it does
SkillSync turns any YouTube tutorial into an interactive practice session. Paste a URL and Gemini 3 analyzes the video to:
- generate timestamped stop points and short-context summaries,
- create evidence-grounded questions and rubrics,
- auto-pause playback for short practice rounds,
- evaluate freeform answers with scores and timestamped evidence,
- analyze spoken delivery (prosody) and offer coaching on tone, pace, and confidence,
- export study packs (Markdown / Google Docs) and parts lists for technical videos.
How I built it
SkillSync was built through an iterative human–AI co-design process using Google AI Studio, Gemini 3, Visual Studio Code Copilot, to not only generate code, but to enhance about product scope, UX flow, safety boundaries, and system design.
Frontend: React + Vite + TypeScript for a fast, single-page demo UI.
AI: Gemini 3 (flash preview) for native video understanding, structured JSON outputs, and prosody analysis. Responses are validated with JSON schemas for deterministic parsing.
Voice: Web Speech API + optional TTS for roleplay and coaching playback.
Video: YouTube IFrame Player for precise timestamps and timeline markers.
Storage & UX: LocalStorage caching (7-day TTL), an export modal for Markdown / Google Docs, and a lightweight state machine to manage lesson flow.
Repo: [https://github.com/schu37/skillsync] · Demo: [https://skillsync.space]
Challenges I ran into
Prompt engineering: getting reliable, timestamped stop points and schema-valid output required careful system prompts and two-pass prompting.
Safety: preventing generation of unsafe step-by-step instructions for dangerous technical tasks required explicit safety flags and summary-only fallbacks for risky content.
Prosody analysis: capturing useful vocal feedback required sending raw audio and translating Gemini’s prosody output into actionable coaching tips.
Cost & latency: multimodal, long-context analysis is heavier than text-only models, so caching and optional “force refresh” controls were added.
Accomplishments I am proud of
Native video understanding: Gemini 3 analyzes YouTube URLs directly — no manual transcription pipeline.
Multimodal lesson plans: structured JSON lesson plans with stop points, rubrics, and gold answers.
Voice coaching: end-to-end roleplay that evaluates delivery (tone, pace, confidence), not just content.
Export pipeline: study packs downloadable as Markdown and exportable to Google Docs for NotebookLM compatibility.
Demo-ready product: a single-page app deployed on Vercel demonstrating the full flow (paste → analyze → practice → export).
What I learned
Gemini 3 is especially powerful when used as a reasoning and design partner, not just a content generator.
Structured prompts and JSON schemas drastically reduce parsing errors and make AI outputs production-usable.
Prosody matters: learners change behavior faster when given short, concrete feedback on how they speak.
Safety-first design is essential for any “how-to” content; detecting and flagging unsafe instructions protects users and judges alike.
Small UX details (timers, skip-answered toggles, sticky panels) significantly improve retention in practice loops.
What’s next for SkillSync
SkillSync is intentionally built local-first, but its architecture is designed to scale.
Short Term: User Accounts & Cloud Sync
The next step is migrating persistence from localStorage to Supabase, enabling:
- Google SSO authentication,
- cross-device sync for progress, notes, and session history,
- persistent user profiles and learning preferences.
This migration is straightforward because storage is already abstracted behind a service interface, requiring no major architectural changes.
Medium Term: Learning Analytics & Retention
SkillSync will evolve from single sessions into a longitudinal learning tool by adding:
- learning dashboards that visualize progress over time,
- skill proficiency tracking across domains (e.g. communication, technical skills),
- spaced-repetition reminders based on past performance.
These features help learners build durable skills instead of one-off understanding.
Long Term: Active Video Learning Platform
Longer term, SkillSync becomes a platform for active video learning at scale:
- playlist-level courses built from YouTube videos,
- community-shared lesson plans and practice sessions,
- a browser extension that turns any tutorial into guided practice instantly,
- deeper Gemini-powered personalization across languages, modalities, and skill types.
The long-term goal is not to create more content, but to make practice the default way people learn from video.
Additional technical details are available in ARCHITECTURE.md on GitHub.
Built With
- copilot
- gemini
- gemini-tts
- google-ai-studio
- google-cloud-console
- google-docs
- oauth
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.