Inspiration

A 4-year human kid ingests 10^15 bytes (from vision alone), which is ~50x more data than an advanced LLM is trained on! So why not give these models the chance to see through our eyes?

What it does

It understands human gaze and is able to simulate how human attention tends to be in different spatial settings.

How we built it

Using labelled data set that we personally curated and labelled from Ironsite's data repository as well as from looking at different websites, we feed these into a simple unit + transformer system to train it.

Challenges we ran into

Tracking gaze is really hard because the eye is a very small and active organ - it took a long while to collect good data.

Accomplishments that we're proud of

The model works relatively well and we are able to pretty accurately how a human gaze might move in real settings. We also built 2 real applications on top of this base model.

What we learned

Gaze is a very under-researched area, but understandably so, there's too much variance, volatility and noise when it comes to getting good data.

What's next for SkyNet

We want to collect more and better data to build an even more accurate model.

Built With

Share this project:

Updates