Inspiration

Managing files in Google Drive can quickly become overwhelming, especially when juggling class materials, assignments, student projects, and administrative documents across different courses or academic years. Tasks like moving files between folders, summarizing lengthy documents, or safely deleting outdated resources typically require multiple manual steps, leading to lost time and potential mistakes.

We wanted to streamline Drive management for educators and students by building an agentic assistant that not only interprets natural language commands but also proactively confirms user intent before executing critical actions. This flow ensures accuracy, prevents accidental deletions, and improves efficiency for tasks like organizing class resources, managing student submissions, or curating content for different courses—all through a conversational interface.

What it does

DocuPilot transforms Google Drive into an intelligent, conversational workspace.
With natural language commands, users can:

  • Move files between folders with phrases like:

    “move my doc titled budget from drafts to final”
    even if the folders or documents are nested deep within Drive.

  • Summarize or analyze document contents directly in the chat interface:

    “summarize my project proposal doc”
    generates a concise summary from the document’s actual content.

  • Create new Google Docs from prompts like:

    “create a document called Q2 Strategy Plan”
    with an auto-generated preview that users can confirm, regenerate, or skip.

How We Built It

Natural Language to Action Execution Pipeline:

  • User submits a natural language command (e.g., “delete my doc syllabus draft”).
  • Gemini API parses the intent (e.g., deleteDoc, doc_name: syllabus draft).
  • Drive context cache is fed to Gemini for accurate parsing (file/folder names).
  • Backend (FastAPI + LangChain) manages task flow:
    • Confirms intent, then executes the action.
  • Frontend (React + Vite) displays the conversation, previews, confirmations, and updates.

Agentic Features

  • Context-aware reasoning with Drive structure cache.
  • Injects Drive data into Gemini’s long-context input for precise intent detection.
  • Gemini API: Parses natural language into structured intents.
  • Google Drive API: Executes file operations (move, delete, content retrieval).
  • LangChain: Orchestrates agentic flows (confirmations, multi-step tasks).
  • FastAPI: Backend for session management, token refresh, and workflow control.
  • Drive Cache: Periodically indexes Drive structure for contextual reasoning.
  • React + Vite Frontend: Interactive chat UI with dynamic prompts, threaded sessions, and real-time updates.
  • OAuth2 Security: Ensures secure, consent-based access to Drive files with minimal scopes.

Challenges we ran into

  • Managing Google OAuth token expiration and refresh cycles without interrupting user sessions.
  • Carefully crafting prompts for Gemini to ensure it produced structured, actionable responses.

Accomplishments that we're proud of

  • Built a multi-functional Drive assistant that can create, move, and analyze documents using only natural language.
  • Integrated Gemini's API with Google Drive APIs for contextual understanding and dynamic actions.
  • Developed multi-turn conversational flows with confirmations, enhancing user experience and safety.

What we learned

  • Managing multi-turn conversations and state for reliable agentic workflows.
  • The importance of precise LLM prompting for structured responses.
  • Handling OAuth2 token refreshes seamlessly within user sessions.
  • Integrating Drive context caching to improve LLM accuracy.
  • Balancing backend task flows with intuitive frontend UX.

What's next

  • Add support for Google Sheets, Slides, and Forms to create and manage more file types via conversation.
  • Enhance Gemini’s intent detection for more varied and complex prompts.
  • Add multimodal capabilities for image uploads or generating multimedia presentations.
  • Allow users to share documents, set permissions, or retrieve shared-with-me files via chat.
  • Introduce undo functionality for actions like moving or deleting files. ```

Built With

  • css
  • fastapi
  • gemini
  • gemini-api-libraries/tools:-langchain-(agentic-workflows)
  • google-cloud
  • google-docs-api
  • javascript
  • langchain
  • oauth
  • oauth)
  • oauth2-(authentication)-platforms/services:-google-cloud-(drive
  • pydantic-(data-validation)
  • python
  • react
  • react-+-vite-(frontend)-apis:-google-drive-api
  • vite
Share this project:

Updates