tunnel_testing
dashboard-overview
dashboard-pt2
hive-orchestrator

HiveSense - Devpost story

Inspiration

To check a hive for Varroa mites, the standard test is an alcohol wash, wherein a beekeeper scoops out about 300 bees and kills them to get a single mite reading. It is lethal, invasive, slow, and it only samples one moment in time. Bees are already under huge pressure worldwide and the main tool we have for watching their health works by killing them. We wanted to flip that, such that we read a colony's health from the outside, continuously, without opening the hive or harming a single bee.

What it does

HiveSense watches bee colonies using two non-invasive signals, which includes the sound inside the hive and a camera at the entrance. Bees walk through a clear 3D-printed entrance tunnel where a vision model reads Varroa load and a mic captures traffic and acoustic stress. A fleet of AI agents turns those signals into per-hive verdicts (Varroa, queenless, swarm risk), and when the signals disagree the system does not guess but it flags needs a human and asks the beekeeper to inspect. A coordinator watches the whole apiary for patterns no single hive can see, and you can ask it in plain English ("how are my bees?") through an ASI:One chat agent.

How we built it

Vision (Varroa): a fine-tuned ViViT video transformer reads 32-frame clips of bees in the tunnel (0.986 accuracy on the VD2 dataset, paper figure plus our local check).
Acoustic: RandomForest models on MFCC and spectral-shape features for colony strength (0.646 balanced accuracy, hive-held-out, matching the published baseline), a bee/no-bee gate, and a Ferrari-frequency rule for swarming.
Agents (Fetch.ai uAgents): seven hive agents, one per colony. Each runs the cheap acoustic check every cycle, then decides whether the reading is ambiguous enough to spend the expensive vision test, then reconciles the two. A "Godfather" coordinator fans them in (7 to 1), finds regional Varroa and neighbour robbing, and speaks the official ASI:One chat protocol.
Durable action (Orkes Agentspan): a Value-of-Information gate that acts on its own when confident and only pauses for a durable human approval on a genuine close call.
Redis Stack memory (the part we are proudest of): every hive reading, sound and sight, is fused into one vector and stored in Redis. Using RediSearch HNSW, a single k-NN query recalls the most similar past states, giving the agents a memory ("this looks like a reading we confirmed last week was a false alarm") before they ever alarm the keeper. Redis also holds the live state (RedisJSON), rolling metrics (RedisTimeSeries), and pushes updates to the dashboard. One vector means one index and one query, where keeping the modalities separate needs two of each.

The fused vector is a lightweight early fusion:

$$ v = \text{L2}\big(\text{L2}(a_{\text{acoustic}}) \oplus \text{L2}(a_{\text{vision}})\big) $$

and retrieval ranks past readings by cosine similarity.

A Vite dashboard shows the apiary live and the whole stack degrades gracefully, with no Redis it falls back to a file store and with no LLM key it falls back to deterministic logic.

Challenges we ran into

Some things are not learnable so we did not use them. Across about 100 MSPB inspections, zero hives crossed the 3 percent Varroa treatment threshold, so the acoustic Varroa label was single-class. We dropped it and kept Varroa as a vision task rather than ship a fake model.
Hidden data leakage. Our queenless model looked great within-hive (0.88) but collapsed cross-hive (0.18): it had learned which colony it was hearing, not queen state. Hive-held-out validation exposed it, and we reported it honestly instead of hiding it.

What we learned

Honest evaluation is a feature, not a footnote. Hive-held-out cross-validation, reporting both within-hive and cross-hive scores, and saying plainly what the data cannot support made the project stronger and more trustworthy.
On a remote (cloud) Redis, a single tiny read is network-bound and not faster than a local file, so we do not claim that. The real wins are capabilities the file store cannot do at all (filtered vector search for retrieval-augmented reasoning, time-series retention, pub/sub) and doing retrieval in one query instead of two (about 2x fewer round-trips, measured live).
A single fused multimodal vector is a genuinely better unit of memory than two separate vectors, with one index, one query, and a shared space to search across.

Accomplishments we are proud of

A working, end-to-end, honest system: real vision and acoustic models, a real multi-agent fleet, a durable human-in-the-loop action layer, and a Redis-powered multimodal memory that we benchmarked live rather than hand-waved, all wrapped in a dashboard and queryable in plain English.

What's next for HiveSense

Swap the concatenation fusion for a fully bound space using ImageBind, so you could search by sound and retrieve by sight and compose modality vectors into a "Varroa-stress" direction. Then take the entrance tunnel from prototype to a weatherproof field unit on real hives.