THE SUNNY PROJECT

Inspiration

Applying pattern recognition and from my own experience, I observe that research challenges recur across papers, projects, and researchers regardless of domain. Most research papers follow predictable structures (abstract, methodology, experiments, results), experiments repeatedly require similar steps to be reproduced and evaluated, and researchers consistently ask the same types of questions about validity, assumptions, limitations, and potential improvements. Likewise, note-taking, statistical extraction, and revisiting prior work follow common patterns that are manually repeated each time. Recognizing these recurring structures and behaviors reveals that research inefficiency is not random, but systematic, making it possible to design an assistant like Sunny that learns, anticipates, and automates these repeated patterns instead of treating every interaction as a one-off task.

What it does

Research today is less limited by intellectual difficulty than by operational friction: researchers spend excessive time reading PDFs, extracting statistics, translating methodologies into code, switching between tools, and manually taking notes, which breaks cognitive flow and causes insights to decay over time. Even with existing AI tools, progress remains slow because traditional text-based prompting is interruptive, context-heavy, and reactive, requiring researchers to already know what to ask. The Sunny Project is a voice-first research assistant that lives inside your computer and works like an AI labmate. Another key strength is context awareness. Sunny understands which paper or tab the researcher is currently viewing and automatically updates its understanding as the user navigates, without requiring repeated explanations. As you read, Sunny analyzes and critiques papers, surfaces research gaps, extracts key ideas and statistics, answers questions through natural conversation, and for Tech/AI researchers, turns methodology sections into runnable experiments, executes them, and suggests improvements. Sunny is also built to keep research moving forward. When code extracted from papers fails, it doesn’t stop at error messages. Sunny diagnoses the issue, fixes the code, and retries automatically, allowing experiments to continue without manual intervention. Everything is logged into a searchable Research Vault, so insights never decay and past work compounds. Sunny removes the busywork from research, preserves cognitive flow, and lets researchers spend their time doing what actually matters: thinking, experimenting, and discovering.

How I built it

Sunny was built by combining multiple Google Gemini APIs into a single, seamless research experience. At its core, Sunny uses Gemini’s vision capabilities to analyze screenshots of research papers in real time, allowing it to extract text, equations, figures, and code from any PDF viewer or browser without requiring manual uploads. This visual understanding is paired with Gemini’s function-calling capabilities, enabling Sunny to intelligently decide when to analyze a paper section, execute code, store findings, or verify claims based on natural conversation.

To support experimentation, Sunny leverages Gemini’s built-in code execution environment to safely run Python code derived from research papers. When errors occur, Sunny automatically analyzes the failure, fixes common issues such as missing imports or incorrect assumptions, and retries execution. The system uses a multi-model architecture to balance performance and efficiency, with lightweight models handling frequent screen monitoring and more powerful models reserved for deep analysis tasks. Background processing is handled through a threading and queue-based system, allowing experiments to run autonomously without interrupting real-time voice interaction. All components are coordinated through intelligent state management and quota tracking to maximize free-tier API usage.

Beyond summaries, Sunny is structured around three core components that work continuously in the background. The Screen Monitor continuously captures screenshots of the active window and uses image similarity hashing with an 85% threshold to detect meaningful changes such as page turns, scroll jumps, or paper switches, while ignoring minor cursor movements. When a significant change is detected, Sunny automatically updates its context so it always understands what the user is currently reading.

The Voice Handler enables fully hands-free interaction through bidirectional audio. It uses Windows’ native speech recognition API for voice input and Google Text-to-Speech with pygame for spoken responses, allowing users to interact with Sunny naturally without typing. This design keeps researchers in flow while reading, thinking, or experimenting, and removes the friction of traditional prompt-based interfaces.

The third core component is the Tools system, which defines a set of research-specific actions that Gemini can invoke autonomously. These tools allow Sunny to extract code directly from PDFs, analyze specific sections of a paper, store key findings, and verify claims against previously seen research. For technical users, Sunny can translate methodology sections or pseudocode into executable Python, run experiments in a secure sandbox, stream outputs in real time, or generate files directly inside a VS Code workspace for manual inspection and editing.

All tools are exposed to Gemini through detailed function definitions, enabling the model to reason about which action to take based on the user’s intent rather than explicit commands. Whether the user asks for a critique, a replication, or a verification, Sunny selects the appropriate action automatically. By separating sandbox execution from local execution, Sunny balances safety and flexibility, allowing users to engage deeply with code while preserving a smooth, conversational research experience.

Gemini 3 is crucial to building Sunny

Multimodal Understanding: The application’s central strength lies in its integration of Gemini 3’s multimodal vision and function calling. A dedicated VisionEngine captures screenshots of research papers and submits them to gemini-3 Deep Think mode together with contextual prompts. Leveraging Gemini 3’s function calling, the model autonomously invokes specialized tools to interpret paper content, extract experimental logic, and carry out research tasks without manual intervention.

Agentic Coding & Vibe Coding: To support experiment replication, Sunny utilizes Gemini 3’s native code execution capability. Extracted Python code is sent to Gemini through the CodeExecutor, where it is executed directly. Execution errors are automatically detected and resolved through an iterative feedback loop, enabling rapid debugging and eliminating the need for manual code correction.

Reasoning with Unprecedented Depth and Nuance: Sunny further exploits Gemini 3’s controllable generation parameters by dynamically adapting them to the task at hand. Low-temperature settings (0.1–0.2) are used for deterministic tasks such as paper analysis and statistical extraction, while higher temperatures (≈0.7) are applied for exploratory tasks such as experimental code generation and optimization, ensuring both reliability and creative flexibility.

Challenges I ran into

One issue was accuracy when extracting text and code from complex PDFs. Some papers use dense layouts or unusual formatting, which caused early errors. I solved this by running multiple analysis passes and asking the model to self-correct when results looked incomplete.

Running code from research papers also proved tricky. Many examples assume missing libraries or specific environments. To address this, I built automatic retry logic that lets Sunny fix common errors and re-run the code without user intervention.

Managing API limits was another major challenge. I needed Sunny to feel responsive while staying within strict free-tier quotas. I solved this by tracking usage in real time and prioritizing high-accuracy models only when they were truly needed.

Finally, coordinating real-time voice interaction with background processing required careful design. I used a queue-based system so Sunny can continue conversations while experiments run quietly in the background, then report results when they’re ready.

Accomplishments that I'm proud of

One of the biggest accomplishments is building a truly end-to-end voice-based research workflow. Users can read papers, critique methods, verify claims, and even run experiments entirely through voice, without needing to touch the keyboard. I'm also proud of Sunny’s automatic error recovery. When code fails, Sunny doesn’t just surface the error. It analyzes what went wrong, fixes the issue, and retries on its own, allowing research to continue without interruption. Sunny’s live screen awareness is another standout feature. It understands which paper the user is reading and automatically updates its context as tabs change, without requiring manual input. I also designed the system with the future in mind. Research papers, notes, experiments, and their relationships are stored in a structured way, making it easy to add analytics and visualization features later. Finally, the system is built to be resilient. Sunny gracefully handles rate limits, failed requests, and unexpected errors, ensuring a smooth user experience even when things go wrong.

What I learned

I learned that research assistants become far more useful when they can see, listen, and act. Combining visual context, natural conversation, and tool execution enables Sunny to support real research workflows rather than just answer questions. I also learned that removing friction matters more than adding features. Voice interaction keeps researchers in flow, making it easier to explore ideas as they arise instead of stopping to formulate prompts. On the technical side, I learned the importance of designing clear and efficient prompts for Gemini, especially when working with tool execution. Well-defined tools allow a central orchestrator to decide when and how each capability should be used, which is critical for building a system that feels autonomous, reliable, and responsive.

What's next for THE SUNNY PROJECT

I intend to incorporate Elevenlabs API in order to produce a more natural voice and thus enhancing the quality of the conversations.
Now that I successfully incorporate voice into Sunny, the logical next step is to allow users to interact with Sunny through their gestures.

FOR DETAIL INFORMATION, PLEASE CHECK THE GITHUB REPO

Built With

gemini-api
gtts
keyboard
pillow
pyautogui
pygame
pygetwindow
pyperclip
python
speechrecognition
sqlite

Updates

Huynh Khanh Minh posted an update — Feb 10, 2026 11:31 PM EST

If you’ve checked out this repository and enjoyed the project, I’d really appreciate it if you could leave a star. Your support genuinely means a lot to me!

Log in or sign up for Devpost to join the conversation.

Huynh Khanh Minh posted an update — Feb 09, 2026 10:20 AM EST

This project has no graphical user interface. To test Sunny, clone the repository, set up a Python virtual environment, install the required dependencies, and add your own Google Gemini API key (and optional voice API keys) to a .env file in the project root. Once running, Sunny operates entirely through voice, allowing you to interact with it hands-free while reading research papers in your browser or PDF viewer.

Log in or sign up for Devpost to join the conversation.

Huynh Khanh Minh started this project — Feb 09, 2026 10:16 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.