Inspiration
As Large Language Models become more powerful, one major challenge still exists: different hardware environments cannot efficiently run every model. A lightweight laptop may struggle with larger models, while high-performance systems may underutilize available GPU and memory resources when running smaller models. We wanted to build an adaptive agentic intelligence platform capable of: understanding the hardware environment dynamically selecting the most optimal LLM automatically optimizing prompts based on system capabilities orchestrating AI workflows intelligently collecting telemetry to continuously improve routing decisions
The inspiration came from combining: autonomous AI agents observability platforms real-time telemetry streaming hardware-aware AI orchestration
using modern cloud-native and agentic AI technologies.
What it does
Gemini_Elastic_Kafka is an adaptive multi-model AI orchestration platform powered by: Gemini reasoning agents MCP (Model Context Protocol) tool servers Ollama local LLM runtime Kafka telemetry streaming Elastic observability hardware-aware routing intelligence
The platform dynamically:
- Detects hardware configuration CPU RAM GPU availability available VRAM
- Selects the most appropriate LLM Gemma Phi3 Mistral Llama
- Optimizes prompts for the selected model
- Executes inference through Ollama
- Publishes telemetry events latency success/failure tokens/sec memory usage
- Streams telemetry into Kafka
- Stores analytics in Elastic for observability and future optimization
Gemini acts as the reasoning and orchestration layer, helping the system decide: which model to use whether fallback is needed how prompts should be optimized how the workflow should execute
The result is a self-optimizing adaptive AI system capable of intelligent multi-step orchestration.
How we built it
We built the system using a modular microservice-style architecture.
Backend & Agent Layer Java 21 Spring Boot WebFlux
REST APIs
AI Orchestration Gemini API Ollama local inference runtime
Adaptive routing engine
MCP Tool Server A Python-based MCP server was created using: FastAPI psutil pynvml
The MCP server exposes tools for: hardware inspection model discovery inference execution
telemetry collection
Streaming & Observability Apache Kafka for telemetry streaming
Elastic for indexing and observability
Cloud & AI Integration Gemini for planning and orchestration BigQuery-compatible telemetry architecture
Cloud-ready deployment design
Workflow User Request ↓ Spring Boot Agent ↓ Gemini Planner ↓ MCP Tool Server ↓ Hardware Detection ↓ Adaptive Model Selection ↓ Ollama Inference ↓ Kafka Telemetry ↓
Elastic Analytics
Challenges we ran into
One major challenge was orchestrating multiple AI and infrastructure components together reliably.
Some of the key challenges included: Dynamic Model Routing Choosing the correct model based on: RAM GPU availability prompt complexity inference latency
required careful orchestration logic.
Integrating MCP Architecture
Building a functional MCP server that could expose tools dynamically while integrating with Spring Boot required custom interoperability design.
Telemetry Streaming
Capturing meaningful inference telemetry and streaming it reliably into Kafka while maintaining low latency was challenging.
Resource Constraints
Running local models efficiently on limited hardware while maintaining acceptable response times required prompt optimization and intelligent routing.
Multi-Service Coordination Coordinating: Gemini Ollama Kafka Elastic MCP server Spring Boot services
required careful service orchestration and configuration management.
Accomplishments that we're proud of We are especially proud of building: Adaptive Hardware-Aware AI Routing
The platform can intelligently select different LLMs based on live hardware telemetry.
Agentic AI Workflow Instead of simple chat completion, the system performs: reasoning planning tool usage orchestration
telemetry-driven execution
Full Observability Pipeline We successfully integrated: telemetry streaming Kafka pipelines Elastic analytics
to monitor model performance in real time.
MCP-Based Tool Architecture We created a reusable MCP tool framework capable of extending the agent with: hardware tools GitLab integrations MongoDB integrations
future cloud tools
Production-Style Architecture The system follows scalable cloud-native architecture patterns and can be deployed to: Cloud Run Kubernetes
Vertex AI Agent environments
What we learned
This project taught us several important lessons about building agentic AI systems at scale.
AI Agents Need Observability
Telemetry and analytics are critical for improving AI routing decisions over time.
Hardware Awareness Matters
Model selection cannot be static. Adaptive routing dramatically improves efficiency and reliability.
MCP Creates Powerful Extensibility
MCP-style tool servers make it much easier to extend agent capabilities without tightly coupling services.
Gemini Works Best as a Planner
Using Gemini as the reasoning and orchestration layer while using local models for execution creates a powerful hybrid architecture.
Distributed AI Systems Are Complex Managing: streaming systems model runtimes telemetry pipelines orchestration services
requires strong distributed systems thinking.
What's next for Gemini_Elastic_Kafka
We plan to evolve the platform into a fully autonomous adaptive AI infrastructure layer.
Planned Enhancements Autonomous Self-Optimization
Use telemetry history to continuously retrain routing strategies.
Semantic Memory
Integrate vector embeddings and RAG pipelines for long-term contextual memory.
Multi-Agent Collaboration Allow multiple specialized agents to collaborate: planner agents coding agents observability agents
deployment agents
Cloud-Native Deployment Deploy fully on: Google Cloud Run Vertex AI Agent Builder Pub/Sub
BigQuery
Real-Time Dashboards Build advanced observability dashboards showing: model efficiency latency heatmaps GPU utilization routing accuracy
prompt effectiveness
Enterprise Integrations Expand MCP integrations with: GitLab MongoDB Elastic Confluent Kafka BigQuery
CI/CD systems
Autonomous AI Infrastructure
The long-term vision is to build a system where AI agents can: monitor infrastructure optimize inference strategies select models autonomously self-heal failures continuously improve performance
creating a truly adaptive and intelligent AI orchestration platform.
Log in or sign up for Devpost to join the conversation.