Ventiuno

Example annotation + grouping

Inspiration

We went on a trip to Mexico where we got to visit a casino for the first time together. Since we were new to the game, we all played with our phones out—checking the optimal way to play, analyzing our hands, and even trying to count cards. When we later saw the Meta Ray-Bans, we immediately thought about the possibilities of using the camera and mic to give you real-time, optimal move suggestions!

What it does

Ventiuno takes in a live feed from the Meta Ray-Bans and detects when and where cards are present. Using the recognized cards and the information about previously seen ones, it calculates the best move for you in real time. The recommendation is then sent directly through the headset’s built-in speakers.

How we built it

We designed a custom computer vision pipeline that segments the image based on detected cards, then crops, flattens, and enhances each one before passing it into a small vision-language model (VLM) for fast card recognition.

Once the cards are identified, we use K-Means clustering to assign each card to the correct player’s hand. The system keeps track of the overall game state and combines previous cards with standard blackjack heuristics to determine the optimal move at any given point.

Challenges we ran into

The pipeline—from video feed to audio suggestion—had to be both very fast and very accurate. Initially, we tried to run everything through a VLM for convenience, but it wasn’t fast enough. We switched to a modified YOLO model for card segmentation, which greatly improved speed.

Another major issue was perspective: existing poker/card recognition models perform well from a top-down view but struggle with natural angles. To fix this, we implemented a “flattening” step—grabbing the card’s corner points, unwarping it into its original rectangle, and then passing it into a lightweight VLM for identification. We also used multi-threading in Python to parallelize recognition and drastically reduce latency.

Accomplishments that we’re proud of

We’re proud that Ventiuno works end-to-end—from real-time card recognition all the way to audio feedback. It’s been tested extensively and even accounts for edge cases like splits, doubles, and other game situations. The final result is a robust, fully functional solution that’s ready to be used in any blackjack setting.

What we learned

We learned a lot about computer vision and how combining classical CV techniques with modern AI models can massively improve both speed and accuracy. At first, we assumed we could just throw everything into a VLM, but incorporating preprocessing steps like perspective correction made a huge difference.

We also learned how powerful multi-threading can be for real-time inference—it helped us process video frames, audio output, and model predictions simultaneously with minimal delay.

What’s next for Ventiuno

We’ve already trained and tested our CV + game analysis + audio system on recorded footage from Meta Ray-Bans. However, we ran into some issues setting up a live stream from the glasses. We’re currently waiting for the official Meta API (coming later this year) to fully integrate real-time video input.

Once the API is released, we plan to port everything directly onto the glasses and extend the system to other games—starting with Heads-Up Poker.