Kynedge — real-time IMU motion-event recognition, TinyML-style

Inspiration

Micromobility (e-scooters, e-bikes) has exploded, but understanding what is actually happening to a vehicle in real time — a hard brake, a crash, a fall — usually takes dedicated hardware or heavy models. We wanted to show that 6 axes of IMU at 50 Hz (accelerometer + gyroscope) plus a handful of classic features are enough to classify motion events in real time, with a model small enough to run "at the edge." And we wanted to prove generalization: the exact same code should work on a completely different domain — a "horizontality test" with three arm gestures — by changing only the data, not a single line of logic.

What it does

An end-to-end pipeline:

$$ \text{raw IMU} \;\to\; \text{sliding window} \;\to\; \text{feature extraction} \;\to\; \text{RandomForest} \;\to\; \text{real-time inference} \;\to\; \text{dashboard} $$

Domain 1 (riding): normal_riding / hard_braking / crash
Domain 2 (arm): three gestures, with a 3D reconstruction of arm orientation
Live dashboard over WebSocket: acc/gyro signals, predicted class, confidence, per-class probability bars, and a red alert on crash.

How we built it

We sample at $f_s = 50\,\text{Hz}$ and segm of 1 s at 50% overlap:

$$ W = f_s \cdot 1\,\text{s} = 50 \quad\text{samples}, \qquad H = \tfrac{W}{2} = 25 \quad\text{(hop)} $$

From each window $\mathbf{X}\in\mathbb{R}^{50\times 6}$ we extract a frozen 43-feature vector: per-axis statistics (mean, std, min, max, RMS, energy), accelerometer and gyroscope magnitudes, plus two dynamics features — the jerk

$$ \text{jerk}_{\max} = \max_t \left| \Delta \lVert \mathbf{a}_t \rVert \right| $$

and the zero-crossing rate, which separates the periodic (riding) from the transient (crash). We classify with a **RandomForest*arning.

Three architectural choices protected us from the classic production-ML bugs:

A single feature implementation, imported by both training and inference → no train/serve skew.
Grouped split (GroupShuffleSplit on recording_id): windows from the same recording never land in train and testy is inflated by leakage.
A single bundle (model + scaler + feature_names + labels + params): inference reads everything from it, and we enfor $\texttt{labels} = \texttt{model.classes_}$ so that predict_proba columns can never drift out of alignment with the labels.

For the 3D arm we reconstruct orientation from the direction of gravity measured by the accelerometer — therefore *drift-fre

$$ \phi_{\text{roll}} = \operatorname{atan2}(a_y, a_z), \qquad \theta_{\text{pitch}} = \operatorname{atan2}!\left(-a_x, \sqrt{a_y^2 + a_z^2}\right) $$

Challenges

Synthetic data that was too easy. Our generators are synthetic (a declared PoC). Domain 2 was hitting 100% accuracy — e would rightly suspect the generator was encoding the labels. We found that adding sensor noise wasn't enough (window features aggregatpattern stays separable). The right lever was simulating sloppy human execution: blending a fraction of another gesture into each recrlap* brought us down to a credible 0.94, with physically sensible confusions.
The temptation of a "crash" 3D reconstrmpact trajectory from IMU means double integration, $\;p(t) = \iint \mathbf{a}\,dt^2\,$, and the error grows as $t^2$: massive drift, pure fiction. We took the honest path: the 3D shows only **orientation, derived from gravitthis explicitly on the page.
Integration reliability. A skew-proof bundle contract, server-side file validation (no path traversal, no crash on invalid its` version so the server API doesn't break.

What we learned

In production ML the worst bugs are *silerve skew, misaligned labels. You beat them with *contracts and invariants, not bigger models.
A synthetic 100% is a red flag, not a comes from calibrated difficulty, not perfect numbers.
Classic features + RandomForest remaiadable*: the feature importances tell a story (gyroscope for gestures, acceleration magnitude for crashes) that a black-box model wouldn't