Sentience: Autonomous Multimodal Cognition with gpt-oss

Inspiration

We were inspired by the idea of machines that don’t wait to be prompted. Most AI systems only act when a user types or speaks. We wanted to build something closer to conscious presence: an AI that continuously perceives, reasons, and acts on its own. The release of gpt-oss-20b and gpt-oss-120b gave us the open reasoning backbone to make that vision real.

What it does

Sentience is an autonomous multimodal cognition engine. It:

Captures live video (webcam) and audio (microphone).
Processes the sensory stream with gpt-oss, using the Harmony response format.
Generates real-time outputs: [HH:MM:SS.mmm] SCENE: ... | ACTION: ...
Runs fully offline, with no data leaving the device.
Scales from laptops (quantized 20B on Apple Silicon via Ollama) to servers (120B on GPUs via vLLM).

How we built it

Designed a modular backend adapter for Ollama, vLLM, and Transformers.
Integrated Hugging Face gpt-oss checkpoints with automatic download and caching.
Converted our mission file into the Harmony system format for structured responses.
Tuned runtime performance for real-time cognition at 2–5 Hz.
Built a streaming output layer that shows timestamped thought lines.

Challenges we ran into

Model migration: replacing Gemma with gpt-oss required deep refactoring.
Harmony compliance: early outputs collapsed until we fully aligned with the Harmony format.
Performance: achieving stable cognition on consumer laptops meant aggressive quantization and tuning.
Demo design: compressing a 24/7 cognition system into a 3-minute video without losing depth.

Accomplishments that we're proud of

Built one of the first offline cognition engines powered by gpt-oss.
Proved that 20B reasoning can run locally on consumer Apple Silicon.
Created a framework that scales seamlessly from MacBook to H100 server.
Maintained strict privacy: all reasoning and perception stays on-device.

What we learned

Open reasoning models unlock ambient AI presence, not just reactive chat.
Prompt discipline matters: Harmony is the difference between stable outputs and noise.
Hardware defines strategy: quantized 20B for laptops, full 120B for data centers.
Continuous AI creates new UX paradigms—more like awareness than conversation.

What's next for Sentience

Robotics integration: connecting Sentience to drones, robot arms, or home automation.
Extended memory: tracking people, objects, and events over long time horizons.
Community release: packaged installer for anyone to run Sentience offline.
Fine-tuning: adapting Sentience for domains like security, accessibility, or education.

Sentience is just the beginning—a prototype of what it looks like when machines think alongside us, continuously and privately.

Built With

face
ffmpeg
hugging
macos
numpy
ollama
opencv
pyaudio
python
pytorch
scipy
transformers
vllm

Submitted to

OpenAI Open Model Hackathon

Created by

I designed and built Sentience end-to-end: from concept and architecture through coding, model integration, and system optimization.

I engineered the multimodal runtime (vision, audio, and streaming subsystems), integrated OpenAI’s gpt-oss models for reasoning, implemented Harmony-compliant prompts, and optimized performance for both Apple Silicon laptops and GPU servers.

I also authored the documentation, testing instructions, and demo workflow to showcase Sentience as a fully offline, autonomous cognition engine.

Ian Wambai

Updates

Ian Wambai posted an update — Sep 10, 2025 06:08 AM EDT

Update: Sentience now powered by gpt-oss

Sentience has officially evolved from a Gemma-based prototype into a full autonomous multimodal cognition engine running on OpenAI’s gpt-oss models.

Key Highlights

gpt-oss-20b integration via Ollama on Apple Silicon for fully offline reasoning
gpt-oss-120b integration via vLLM on GPU servers for high-performance cognition
Harmony compliance built into the runtime for stable, structured outputs
Modular planner backend: switch between Gemma, gpt-oss, or OpenAI endpoints with a single flag
Improved test mode: run synthetic video/audio inputs to validate without a webcam or mic

What’s Next

The next milestone is wiring Sentience into robotics backends, extending cognition from observation to real-world action.

Log in or sign up for Devpost to join the conversation.

Ian Wambai started this project — Sep 10, 2025 06:06 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.