Inspiration
In India, the vocational training landscape is a paradox of high potential and systemic friction. While millions of students enter the workforce annually, nearly 30–40% of ITI (Industrial Training Institute) seats remain vacant due to a simple lack of guidance. For those who do enroll, the dropout rate hovers around 73%, often because students select trades based on local hearsay rather than their actual aptitude. We were inspired to build PRAGATI to bridge this "Guidance Gap" for the underprivileged—those for whom traditional career counseling is either too expensive or only available in a language they don't speak.
What it does
PRAGATI is a multilingual AI career mentor. It allows users to type in their regional language to describe their life situation, interests, and strengths. The system then:
- Analyzes Traits: Uses academic frameworks from vocational assessment reports to map the user’s narrative to specific quantitative and qualitative vocational traits.
- RAG-Powered Matching: Performs a Retrieval-Augmented Generation (RAG) search against verified Government ITI syllabi and National Skills Qualifications Framework (NSQF) standards.
- Personalized Roadmaps: Generates a step-by-step roadmap in the user’s native language, including specific trade recommendations (e.g., Solar Technician, CNC Operator) and potential salary outcomes.
How we built it
The intelligence of Pragati is grounded in a robust RAG (Retrieval-Augmented Generation) pipeline hosted on the Databricks Data Intelligence Platform.
Data Ingestion: We developed custom ingestion scripts to process complex government datasets, including Karnataka ITI syllabi and National Skills Qualifications Framework (NSQF) standards.
Vector Search: Using GTE-large embeddings, we indexed these verified curricula into a Databricks Vector Search endpoint.
Deployment: The frontend is a clean, minimalist Gradio interface deployed as a Databricks App, ensuring low latency and high scalability.
Challenges we ran into
One major challenge was the quality of source data. Government PDF syllabi are often inconsistently formatted. We had to build robust cleaning scripts (itik_data_ingest.py) to handle HTML entity decoding and inconsistent text extraction.
Accomplishments that we're proud of
- Indic Language Support: Successfully implementing a UI that understands and responds in languages like Hindi and Kannada without losing technical accuracy.
- Scientific Foundation: Integrating a complex set of 25+ indicators for trait mapping, moving beyond simple "interest" questions to quantitative aptitude assessment.
- End-to-End Automation: Building a fully automated pipeline from raw PDF ingestion to a served Gradio application.
Log in or sign up for Devpost to join the conversation.