Visual Difference Engine Using Animal Vision

Inspiration

Many F1 and sports car designs were modeled after animals such as sharks, hawks etc. We figured that we could similarly model a Visual Difference Engine meant for Formula races and multiple drone-related functions based on how a hawk perceives vision.

A VDE serves multiple functions in F1, the most notable one being how it can be used for in-race analysis and strategy planning. It can also serve multiple uses in drone-related services such as surveillance, drone-racing and FPV drones.

We aim to create a VDE which will consume fewer resources thanks to

Less encoding of images
Only focusing on the essentials of the image
Creating a self improving model that constantly improves its predictions and thus needs less training data ## What it does

A typical Visual Difference Engine will notice the 'difference' between multiple camera frames. A biological eye on the other hand is trained to notice the 'difference' itself.

These differences include edges, movement and contrast. Essentially, our model of the VDE will learn expected movements, predict future movements and detect anomalies from the prediction.

How we plan to build it

The first step is to gather a lot of training data for the VDE. We will be using past footage from F1 championships.

Preprocessing

The most crucial part of the image preprocessing step would be logo detection. We are currently not sure of our hawk-based model's ability to detect logos that well, so we shall integrate a separate model for the logo-detection.

Besides standard image processing techniques such as de-warping, adjusting for lighting conditions etc we will also be increasing the contrast via multiple transforms at this stage

a rough idea of our preprocessing pipeline

[Sensor Input] ↓ [1. Time Sync & Frame Tagging]---->Precision Time Protocol ↓ [2. Sensor Fusion Alignment]----->COLMAP, DeepCalib, PoseNet, RegNet ↓ [3. Spatial Calibration & Rectification]-------->OpenCV ↓ [4. Temporal Filtering & Noise Reduction]------>OpticalFlow-Guided Denoiser ↓ [5. Motion Stabilization & ROI Prediction]------->RAFT, PWC-Net, STABNet, DeepFlow, ViT-SAM ↓ [6. Data Normalization & Packaging]---------->StandardScaler, z-score norm, whitening, learned encoder (AutoEncoder)

Edge Computing

In this stage we will want to reduce the latency as much as possible. So we'll be using fast embedded GPU systems such as NVIDIA Jetsons nano or a small RTX-A series. Assuming the worst case scenario we will be using cloud based parralel processing because a) a hawk's brain also computes the motion parallely b) will reduce GPU strain and help with low-tech hardware.

Challenges

A hawk's eye is essentially a camera with an estimated resolution of 240 megapixels, far superior to any modern cameras. Working on a hardware camera which could emulate such vision is beyond our scope and would be unfeasibly expensive and resource heavy. We will assume standard high quality cameras which only use 3 channels (RGB) vs 5 channels (RGB+UV) and emulate hawk-vision via software.

The data in a camera is also recorded frame by frame, encode them and have them be processed by a neural network. On the other hand a hawk gets the work done with zero encoding, only processing the differences within motion.

The biggest hurdle will be finding the right power-latency tradeoff: In edge computing, we could either use large GPU's on site which have low latency but consume heavy amount of resources, or use cloud based processing which will increase the latency but be less resource heavy. We have to train the system further to find the right tradeoff.

What we learned

A lot about image processing, image prediction, and GPU + cloud computing How to let the model guide us towards the correct pipeline.

What's next for Visual Difference Engine Using Animal Vision

A hawk's vision may end up missing finer details and so we will have to implement a parrallel process via a different model to keep track of finer details which could get lost in motion blur such as logos, paint or the motion of multiple parts.

Use ROS based system to implement camera motion which will help in better input images.

Measure how much better this system performs compared to a regular VDE

Give smart AI insights- essentially have our hawk talk to us and give low level advice and predictions

Generalize the system- move it from being a solely F1 thing to multiple other sectors.