As an engineering student living in a hostel, my life is a chaos of assignments, hackathons, and deadlines. But the one deadline that always induced panic was College Fees. I realized that the problem wasn't the payment itself—it was the executive function required to manage it. I had to:

  • Remember the deadline.
  • Check my bank balance (or crypto wallet).
  • Wait for the government scholarship to hit.
  • Time my crypto payments when gas fees were low. I didn't need a chatbot to tell me "You have fees due." I needed a Bursar—an agent that could take responsibility, plan the payment strategy over days, and execute it without me holding its hand. With Gemini 3, I realized I could finally build this. I transformed my existing project, BECBillDesk, from a passive payment gateway into the "hands" for an autonomous Marathon Agent. 🧠 What it Does The Autonomous Bursar is a "set-and-forget" financial agent for students. It doesn't just chat; it acts.
  • Multimodal Ingestion: I snap a photo of a physical "Fee Circular" on the notice board. The Agent extracts the amount and deadline.
  • Strategic Planning: It doesn't pay immediately. It checks my connected wallets (UPI & Crypto).
  • The Marathon Loop: The agent enters a long-running loop. It monitors:
    • Scholarship Status: "Has the government grant arrived?"
    • Network Congestion: "Is the ETH/Solana gas fee low right now?"
  • Autonomous Execution: When conditions are perfect (e.g., Scholarship Received + Low Gas), it executes the payment function automatically and emails me the receipt. ⚙️ How We Built It The architecture follows the "Brain & Hands" pattern, leveraging Gemini 3's advanced reasoning and large context window.
    1. The Hands (Tool Definitions) I exposed my Next.js/MongoDB backend as a set of deterministic tools. We used the Gemini Function Calling API to define these capabilities: const tools = [ { name: "fetch_fee_dues", description: "Queries the college database for outstanding student fees." }, { name: "execute_crypto_payment", description: "Initiates a blockchain transaction if the wallet is authorized.", parameters: { type: "OBJECT", properties: { ... } } } ];
  1. The Brain (Gemini 3 Reasoning) This is where the "Action Era" shines. We don't just ask Gemini to output text; we ask it to output a Thought Trace. The agent evaluates an optimization function to decide when to pay. We modeled the decision logic using a cost-minimization function that the Agent "solves" periodically. The objective function to minimize total cost C(t) at time t is: Subject to the constraints: Where:
    • F_{fee} is the fixed college fee amount.
    • G_{gas}(t) is the dynamic blockchain gas fee at time t.
    • T_{deadline} is the fee submission deadline extracted from the notice.
    • B_{wallet}(t) is the student's current wallet balance. The agent loops through this evaluation every hour. If G_{gas}(t) is high, it waits. If t approaches T_{deadline}, it prioritizes payment regardless of gas fees to avoid penalties.
  2. The Tech Stack
    • Agent Core: Google Gemini 3 API (Reasoning & Tool Use)
    • Frontend/Dashboard: Next.js 14 (App Router) & Tailwind CSS
    • Backend: Node.js serverless functions
    • Database: MongoDB (storing user state and agent logs)
    • Blockchain Integration: Wagmi (for crypto wallet interactions) 🚧 Challenges We Faced
    • The "Hallucinated Richness": Early on, the Agent would happily "pay" fees with non-existent money. We had to implement strict "Reality Check" loops where the Agent must call check_balance and verify the output before even thinking about calling execute_payment.
    • Context Management: Keeping the agent "alive" over days without burning through millions of tokens was hard. We implemented a "State Summary" system where the agent summarizes its previous thoughts into a compact JSON object before sleeping and waking up for the next check. 🏆 Accomplishments that I'm Proud Of I am most proud of moving beyond the "Chatbot" paradigm.
    • It's not a bot you talk to; it's a bot that works for you.
    • We successfully integrated crypto-native logic (gas optimization) with real-world academic logic (deadlines), creating a bridge between Web3 and the student life. 📚 What I Learned
    • Prompt Engineering vs. Agent Engineering: I learned that prompts for agents need to be like code specifications. You can't just say "be helpful"; you have to define "success states" and "failure modes."
    • The Power of Multimodality: Being able to parse a blurry photo of a notice board using Gemini Vision changed the UX from "Data Entry" to "Magic." 🚀 What's Next for The Autonomous Bursar We plan to implement "Negotiation Mode," where the Agent can draft an email to the college administration requesting a deadline extension if it detects that scholarship funds will be delayed. The "Action Era" is just beginning!

Built With

Share this project:

Updates