Inspiration
You can buy an optical microscope on Amazon for $80, but the expertise required to interpret what you’re seeing takes years of medical training. Whether it’s a high school student seeing "purple blobs" instead of mitosis, or a rural nurse unable to confirm a diagnosis without a pathologist on-site, the problem is the same: Access to optics is cheap; access to answers is expensive.
We built Myko to bridge this gap.
What it does
Myko is a real-time language agent for optical microscopy. Users clip their phone to any microscope, and Myko streams the feed to our backend, where agentic models run real-time segmentation, detection, and cellular analysis. The system allows you to ask natural language questions such as "Is this tissue healthy?" or make commands such as "Highlight the macrophages," “Count the nuclei,” or “Segment abnormal regions.”
How we built it
Myko is a hybrid edge-cloud system designed for low latency and high intelligence.
- Frontend: Native iOS app built with SwiftUI. Uses AVFoundation for real-time camera capture, Apple SpeechAnalyzer framework for on-device speech-to-text, and a persistent WebSocket connection to stream microscope frames to the backend at full frame rate
- Backend: Python FastAPI server exposing a REST endpoint and a WebSocket. The agent calls a vision-language model via the OpenAI-compatible API for image understanding and tool-use reasoning. A custom OpenCV/watershed segmentation pipeline (with optional SAM2 and Cellpose backends) proposes and renders cell masks in real time, overlaying results onto the streamed frames before returning them to the client. Frames are tunneled to the device via ngrok.
When the VLM receives a query, it invokes specific computer vision tools (segmentation and classification models) running on the GX10's Nvidia GB100 GPU. The server processes the frame and sends the segmentation masks back to the iPhone to be overlaid in real-time.
Challenges we ran into
About halfway through the hackathon, our primary demo microscope fell off the table and shattered into several pieces. We had to duct-tape the optics and realign the lenses to get a clear image again. Hardware is hard!
Accomplishments that we're proud of
- Seamless AR Overlay: Seeing the AI draw a perfect bounding box around a microscopic cell in real-time on a phone screen feels like magic.
- The Architecture: We successfully integrated a complex tool-calling loop (Audio -> Text -> LLM -> CV Tool -> Visual Overlay) that feels instantaneous to the user.
What we learned
- VLMs need tools: Pure vision models are great at describing "a slide of cells," but they struggle with specific tasks like "count exactly 14 cells." The agentic approach (calling a counting tool) is far superior.
- Microscopy is messy: Real-world slides have dust, bubbles, and bad lighting. We learned a lot about preprocessing images to make them readable for the AI.
- The power of edge compute: Moving the inference to the ASUS GX10 was critical; the phone simply couldn't handle the heavy segmentation models alongside the AR rendering.
What's next for Myko
We want to distill our models down to run locally on-device for use in areas without internet access.
Built With
- avfoundation
- coreimage
- fastapi
- ngrok
- openai
- opencv
- pydantic
- pytorch
- swiftui
- uvicorn


Log in or sign up for Devpost to join the conversation.