About Reflex
The Inspiration
Securing GenAI applications forces developers into a difficult trade-off: Latency vs. Security.
- The Fast Way: Simple keyword filters are instant (milliseconds) but "dumb" and easily bypassed by creative jailbreaks.
- The Smart Way: "LLM-as-a-Judge" firewalls are highly intelligent but add 1–3 seconds of latency per request and cost a fortune at scale.
We built Reflex to destroy this trade-off. We asked: What if our firewall could be "dumb" and fast during the day, but get "smarter" overnight?
What it does
Reflex is a Self-Healing GenAI Firewall. It combines the sub-millisecond speed of vector search with the deep reasoning of Google Gemini.
- Real-Time Immunity: Incoming user prompts are checked against a vector database (Pinecone). If a prompt matches a known attack signature, it is blocked instantly
- Asynchronous Learning: All traffic is logged and archived. At night, a batch process wakes up and feeds the day's conversation logs to Gemini 2.5 Flash on Vertex AI.
- The "Reflex" Loop: Gemini acts as a "Security Judge," analyzing the logs for sophisticated, zero-day attacks that slipped through. When it finds one, it extracts the exact injection vector and pushes it back to the vector database.
- The Result: The system literally wakes up smarter. An attack that worked on Tuesday will be automatically detected, extracted, and blocked by Wednesday morning—without any human writing a rule.
How we built it
Reflex is an event-driven, cloud-native architecture built on Google Cloud Platform.
- The Edge (Golang & Pinecone): We built a high-performance Ingestor service in Go. It handles HTTP traffic and performs a semantic search against Pinecone to detect known threats instantly. This keeps the "hot path" incredibly fast.
- The Nervous System (Kafka & GCS): To avoid slowing down the user, logging is completely asynchronous. The Ingestor pushes events to Kafka, where a Loader service batches them into highly JSONL archives on Google Cloud Storage.
- The Brain (Vertex AI & Gemini 2.5): We utilize Vertex AI Batch Prediction to process thousands of conversations at once. We engineered a specialized
Security Judgeprompt that instructs Gemini to detect "Persona Adoption," "System Override," and "Jailbreak" attempts. - The Learning Loop (Extraction Service): This is the "secret sauce." When Gemini flags a conversation as malicious, the Extraction Service parses the output, isolates the specific prompt injection string, and upserts its vector embedding into Pinecone. This closes the loop, turning a slow analytical insight into a fast, reusable defense.
Challenges we ran into
- False Negatives: Initially, embedding the entire conversation meant that longer strings with lots of mis-directions ended up scoring very low even with exact string matches - to solve these, we use a sliding window for the input and use the highest score in order to assess the calculate the score of the string
- Batch Processing Scale: Handling massive JSONL files for Vertex AI required careful stream processing in Go to keep memory usage low.
Accomplishments that we're proud of
- Building a truly closed-loop security system. Watching the system fail to stop an attack, run the batch job, and then automatically block that same attack the next time we tried it felt like magic.
- Leveraging Gemini 2.5 Flash's massive context window and cost efficiency to make "check every single log" a financially viable strategy.
What's next for Reflex
- Federated Immunity: Allowing different Reflex nodes to share their "antibodies" (attack vectors) so that if one user gets attacked, everyone is protected.
- Multi-Modal Defense: Expanding the "Security Judge" to detect visual jailbreaks in image inputs.
- Spot instances: Taking the cost-efficiencies of batch inference a step further and using kafka streams to enable AI jobs to run on bespoke, GPU backed VMs using spot instances
Log in or sign up for Devpost to join the conversation.