Sensus

Inspiration

Sensus was inspired by a simple question: what would a truly voice-first computer experience look like for someone who cannot rely on a screen? Most assistants can answer questions, but they struggle with real desktop tasks. We wanted to build something that could actually operate a Linux machine, navigate the web, and keep context across sessions in a way that feels practical for visually impaired users.

What it does

Sensus is a voice-first Ubuntu assistant that can:

Understand spoken commands and route them to the right capability (browser, desktop actions, shell, shortcuts, or coding).
Control Firefox through automation for real web tasks (navigation, clicking, downloads, and accessibility checks).
Show a lightweight top-right overlay UI for live interaction, status, and session history.
Speak responses naturally with tuned TTS buffering for smoother playback.
Optionally persist sessions/messages in IBM Db2 and generate session summaries for quick recall.
Use multimodal vision models for screenshot-based understanding when needed.

How we built it

We built Sensus as a modular Python system:

Orchestrator: OpenAI-compatible model routing and tool-calling logic for deciding what action to take.
Voice stack: STT + TTS pipeline with real-time streaming and VAD-style handling.
Agents: Separate modules for browser automation, coding tasks, desktop actions, and shortcuts.
Overlay: GTK/WebKit overlay window pinned in the corner for a persistent, non-intrusive UI.
Storage: Optional IBM Db2-backed session/message persistence.
Infra: Environment-driven config (.env) for model selection, timeouts, browser behavior, and audio tuning.

Challenges we ran into

Making voice interactions reliable under real-world latency and varied model response times.
Handling Linux display/server differences (X11 vs Wayland), especially for pinned always-on-top overlays.
Keeping browser automation robust when websites throw modal overlays, dynamic DOM changes, and download edge cases.
Avoiding audio glitches (underruns/static) in TTS streaming.
Balancing a powerful tool-calling assistant with safe, deterministic behavior.

Accomplishments that we're proud of

A working end-to-end voice-first assistant that can execute meaningful computer tasks.
A clean tool-routing architecture that makes the assistant extensible.
A practical accessibility-first overlay experience with session history and chat continuity.
Integration of multimodal + browser + system actions in one cohesive UX.
Real persistence support (Db2) for session memory beyond a single process.

What we learned

Accessibility is not a "feature"; it has to shape every architecture decision.
Reliability beats novelty in voice UX: buffering, retries, and fallbacks matter more than flashy demos.
Tool-using agents need strong prompting constraints and clear execution boundaries.
Cross-environment Linux behavior can be the hardest engineering problem in UI/system automation projects.
Iterating with real usage scenarios exposes edge cases faster than synthetic tests.

What's next for Sensus

Improve conversational memory and personalization across longer time horizons.
Expand desktop integrations (more apps, richer system controls, and tighter shortcut workflows).
Add stronger safety/confirmation layers for high-impact actions.
Improve onboarding and deployment so non-technical users can install and run Sensus quickly.
Continue hardening browser + vision reliability for real-world websites.
Run user testing with visually impaired users and prioritize roadmap items from direct feedback.

Built With

featherlessai
html
ibm
python
react
typescript

Submitted to

IBM x UNSA Hackathon
- Winner Coffee Chat with IBM Leader - Ashar Azmat - Senior Technical Consultant at IBM | TPO | Adobe & MarTech
- Winner Coffee Chat with Asma Ahmed (IBM Z Sheridan President and Founder of UNSA Sheridan)
- Winner Coffee Chat with IBM Leader - Lucas Sahn - IBM Z Skills Program Manager - NA Skills Leader
- Winner Top 10 Teams
- Winner Best Solo Hack

Created by

Haris Kamal
CS + MATH + GIS @ UTM

Updates

Haris Kamal started this project — May 10, 2026 06:52 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.