The Story of Spinning: Giving AI a Sense of Physics

The Inspiration

The idea for Spinning was born from a simple observation: humans can estimate how fast an object is spinning just by looking at it, but they lack precision. Engineers, on the other hand, rely on expensive stroboscopes or tethered IMU sensors. We asked ourselves: "Can we use Gemini 3's multimodal capabilities to teach a smartphone to 'feel' physics through its camera?" We wanted to create a bridge where the temporal resolution of sensors meets the spatial intelligence of AI.


How We Built It

The architecture of Spinning is built on a "Learn-then-Track" loop:

  1. The Core Engine: We utilized the Google Generative AI SDK to interface with Gemini 3 Flash.
  2. Data Fusion: We developed a high-frequency data logger using CoreMotion to capture angular velocity ($\omega$) and quaternions at $100\text{Hz}$, synchronized with 1080p video frames via AVFoundation.
  3. The Calibration Logic: We prompted Gemini 3 to analyze the relationship between pixel-level texture displacement and the ground-truth gyroscopic data. By feeding it snippets of sensor telemetry alongside video frames, we enabled the model to understand the object's moment of inertia.

Challenges and Math

The biggest hurdle was the "Visual-Inertial Gap." In early prototypes, the AI struggled when the rotating object was a featureless cylinder. We solved this by implementing a semantic-physical mapping. We taught the model to calculate the expected rotational stability based on the object's recognized geometry.

We defined the stability score ($S$) as a function of the variance in the rotation axis:

$$S = 100 \times \exp\left(-\lambda \cdot \sigma^2_{\text{axis}}\right)$$

where $\sigma^2_{\text{axis}}$ represents the deviation of the instantaneous rotation vector from the calibrated mean, and $\lambda$ is a damping factor adjusted by Gemini based on the object's perceived mass and semantic classification.


What We Learned

Building Spinning taught us that Gemini 3 is more than a language model; it is a world-reasoning engine. We discovered that the model's ability to handle long-context multimodal inputs makes it uniquely suited for sensor-fusion tasks that were previously reserved for complex Kalman filter algorithms. We learned how to isolate API keys securely using .xcconfig and how to build an industrial UI that prioritizes data density and professional aesthetics.


Future Roadmap

Currently, Spinning works on rigid bodies. Our next step is to apply this to human kinetics—analyzing the rotation of a gymnast or a diver in mid-air, where tethering a sensor is impossible but precision is everything.

Share this project:

Updates