Sentience: Autonomous Multimodal Cognition with gpt-oss

Inspiration

We were inspired by the idea of machines that don’t wait to be prompted. Most AI systems only act when a user types or speaks. We wanted to build something closer to conscious presence: an AI that continuously perceives, reasons, and acts on its own. The release of gpt-oss-20b and gpt-oss-120b gave us the open reasoning backbone to make that vision real.

What it does

Sentience is an autonomous multimodal cognition engine. It:

  • Captures live video (webcam) and audio (microphone).
  • Processes the sensory stream with gpt-oss, using the Harmony response format.
  • Generates real-time outputs: [HH:MM:SS.mmm] SCENE: ... | ACTION: ...

  • Runs fully offline, with no data leaving the device.

  • Scales from laptops (quantized 20B on Apple Silicon via Ollama) to servers (120B on GPUs via vLLM).

How we built it

  • Designed a modular backend adapter for Ollama, vLLM, and Transformers.
  • Integrated Hugging Face gpt-oss checkpoints with automatic download and caching.
  • Converted our mission file into the Harmony system format for structured responses.
  • Tuned runtime performance for real-time cognition at 2–5 Hz.
  • Built a streaming output layer that shows timestamped thought lines.

Challenges we ran into

  • Model migration: replacing Gemma with gpt-oss required deep refactoring.
  • Harmony compliance: early outputs collapsed until we fully aligned with the Harmony format.
  • Performance: achieving stable cognition on consumer laptops meant aggressive quantization and tuning.
  • Demo design: compressing a 24/7 cognition system into a 3-minute video without losing depth.

Accomplishments that we're proud of

  • Built one of the first offline cognition engines powered by gpt-oss.
  • Proved that 20B reasoning can run locally on consumer Apple Silicon.
  • Created a framework that scales seamlessly from MacBook to H100 server.
  • Maintained strict privacy: all reasoning and perception stays on-device.

What we learned

  • Open reasoning models unlock ambient AI presence, not just reactive chat.
  • Prompt discipline matters: Harmony is the difference between stable outputs and noise.
  • Hardware defines strategy: quantized 20B for laptops, full 120B for data centers.
  • Continuous AI creates new UX paradigms—more like awareness than conversation.

What's next for Sentience

  • Robotics integration: connecting Sentience to drones, robot arms, or home automation.
  • Extended memory: tracking people, objects, and events over long time horizons.
  • Community release: packaged installer for anyone to run Sentience offline.
  • Fine-tuning: adapting Sentience for domains like security, accessibility, or education.

Sentience is just the beginning—a prototype of what it looks like when machines think alongside us, continuously and privately.

Built With

Share this project:

Updates

posted an update

Update: Sentience now powered by gpt-oss

Sentience has officially evolved from a Gemma-based prototype into a full autonomous multimodal cognition engine running on OpenAI’s gpt-oss models.

Key Highlights

  • gpt-oss-20b integration via Ollama on Apple Silicon for fully offline reasoning
  • gpt-oss-120b integration via vLLM on GPU servers for high-performance cognition
  • Harmony compliance built into the runtime for stable, structured outputs
  • Modular planner backend: switch between Gemma, gpt-oss, or OpenAI endpoints with a single flag
  • Improved test mode: run synthetic video/audio inputs to validate without a webcam or mic

What’s Next

The next milestone is wiring Sentience into robotics backends, extending cognition from observation to real-world action.

Log in or sign up for Devpost to join the conversation.